Opened https://issues.apache.org/jira/browse/KAFKA-1489 .
Regards, András On 6/11/2014 6:19 AM, Jun Rao wrote:
Could you file a jira to track this? Thanks, Jun On Tue, Jun 10, 2014 at 8:22 AM, András Serény <sereny.and...@gravityrd.com> wrote:Hi Kafka devs, are there currently any plans to implement the global threshold feature? Is there a JIRA about it? We are considering to implement a solution for this issue (either inside or outside of Kafka). Thanks a lot, András On 5/30/2014 11:45 AM, András Serény wrote:Sorry for the delay on this. Yes, that's right -- it'd be just another term in the chain of 'or' conditions. Currently it's <time limit> OR <size limit>. With the global condition, it would be <time limit> OR <size limit> OR <global size limit> In my view, that's fairly simple and intuitive, hence a fine piece of logic. Regards, András On 5/27/2014 4:34 PM, Jun Rao wrote:For log.retention.bytes.per.topic and log.retention.hours.per.topic, the current interpretation is that those are tight bounds. In other words, only when those thresholds are violated, a segment is deleted. To further satisfy log.retention.bytes.global, the per topic thresholds may no longer be tight, i.e., we may need to delete a segment even when the per topic threshold is not violated. Thanks, Jun On Tue, May 27, 2014 at 12:22 AM, András Serény < sereny.and...@gravityrd.comwrote: No, I think more specific settings should get a chance first. I'm suggesting that provided that there is a segment rolled for a topic, *any *of log.retention.bytes.per.topic, log.retention.hours.per.topic, and a future log.retention.bytes.global violation would cause segments to be deleted. As far as I understand, the current logic says (1) for each topic, if there is a segment already rolled { mark segments eligible for deletion due to log.retention.hours.for.this.topic if log.retention.bytes.for.this.topic is still violated, mark segments eligible for deletion due to log.retention.bytes.for.this. topic } After this cleanup cycle, there could be another one, taking into account the global threshold. For instance, something along the lines of (2) if after (1) log.retention.bytes.global is still violated, for each topic, if there is a segment already rolled { calculate the required size for this topic (e.g. the proportional size, or simply (full size - threshold)/#topics ?) mark segments exceeding the required size for deletion } Regards, András On 5/23/2014 4:46 PM, Jun Rao wrote: Yes, that's possible. There is a default log.retention.bytes for everytopic. By introducing a global threshold, we may have to delete data from logs whose size is smaller than log.retention.bytes. So, are you saying that the global threshold has precedence? Thanks, Jun On Fri, May 23, 2014 at 2:26 AM, András Serény <sereny.and...@gravityrd.com>wrote: Hi Kafka users,this feature would also be very useful for us. With lots of topics of different volume (and as they grow in number) it could become tedious to maintain topic level settings. As a start, I think uniform reduction is a good idea. Logs wouldn't be retained as long as you want, but that's already the case when a log.retention.bytes setting is specified. As for early rolling, I don't think it's necessary: currently, if there is no log segment eligible for deletion, log.retention.bytes and log.retention.hours settings won't kick in, so it's possible to exceed these limits, which is completely fine (please correct me if I'm mistaken here). All in all, introducing a global threshold doesn't seem to induce a considerable change in current retention logic. Regards, András On 5/8/2014 2:00 AM, vinh wrote: Agreed…a global knob is a bit tricky for exactly the reason you'veidentified. Perhaps the problem could be simplified though by considering the context and purpose of Kafka. I would use a persistent message queue because I want to guarantee that data/messages don't get lost. But, since Kafka is not meant to be a long term storage solution (other products can be used for that), I would clarify that guarantee to apply only to the most recent messages up until a certain configured threshold (i.e. max 24 hrs, max 500GB, etc). Once those thresholds are reached, old messages are deleted first. To ensure no message loss (up to a limit), I must ensure Kafka is highly available. There's a small a chance that the message deletion rate is the same rate that receive rate. For example, when the incoming volume is so high that the size threshold is reached before the time threshold. But, I may be ok with that because if Kafka goes down, it can cause upstream applications to fail. This can result in higher losses overall, and particularly of the most *recent* messages. In other words, in a persistent but ephemeral message queue, I would give higher precedence to recent messages over older ones. On the flip side, by allowing Kafka to go down when a disk is full, applications are forced to deal with the issue. This adds complexity to apps, but perhaps it's not a bad thing. After all, in scalability, all apps should be designed to handle failure. Having said that, next is to decide which messages to delete first. I believe that's a separate issue and has its own complexities, too. The main idea though is that a global knob would provide flexibility, even if not used. From an operation perspective, if we can't ensure HA for all applications/components, it would be good if we can for at least some of the core ones, like Kafka. This is much easier said that done though. On May 5, 2014, at 9:16 AM, Jun Rao <jun...@gmail.com> wrote: Yes, your understanding is correct. A global knob that controls aggregate log size may make sense. What would be the expected behavior whenthat limit is reached? Would you reduce the retention uniformly across all topics? Then, it just means that some of the logs may not be retained as long as you want. Also, we need to think through what happens when every log has only 1 segment left and yet the total size still exceeds the limit. Do we roll log segments early? Thanks, Jun On Sun, May 4, 2014 at 4:31 AM, vinh <v...@loggly.com> wrote: Thanks Jun. So if I understand this correctly, there really is no masterproperty to control the total aggregate size of all Kafka data files on a broker. log.retention.size and log.file.size are great for managing data at the application level. In our case, application needs change frequently, and performance itself is an ever evolving feature. This means various configs are constantly changing, like topics, # of partitions, etc. What rarely changes though is provisioned hardware resources. So a setting to control the total aggregate size of Kafka logs (or persisted data, for better clarity) would definitely simplify things at an operational level, regardless what happens at the application level. On May 2, 2014, at 7:49 AM, Jun Rao <jun...@gmail.com> wrote: log.retention.size controls the total size in a log dir (per partition). log.file.sizecontrols the size of each log segment in the log dir. Thanks, Jun On Thu, May 1, 2014 at 9:31 PM, vinh <v...@loggly.com> wrote: In the 0.7 docs, the description for log.retention.size and log.file.size sound very much the same. In particular, that they apply to a single log file (or log segment file). http://kafka.apache.org/07/configuration.htmlI'm beginning to think there is no setting to control the max aggregate size of all logs. If this is correct, what would be a good approach to enforce this requirement? In my particular scenario, I have a lot of databeing written to Kafka at a very high rate. So a 1TB disk can easily befilled up in 24hrs or so. One option is to add more Kafka brokers to addmore disk space to the pool, but I'd like to avoid that and see if I cansimply configure Kafka to not write more than 1TB aggregate. Else, Kafkawill OOM and kill itself, and possibly the crash the node itself becausethe disk is full. On May 1, 2014, at 9:21 PM, vinh <v...@loggly.com> wrote: Using Kafka 0.7.2, I have the following in server.properties: log.retention.hours=48log.retention.size=107374182400 log.file.size=536870912 My interpretation of this is: a) a single log segment file over 48hrs old will be deleted b) the total combined size of *all* logs is 100GB c) a single log segment file is limited to 500MB in size before a new segment file is spawned spawning a new segment filed) a "log file" can be composed of many "log segment files"But, even after setting the above, I find that the total combined size of all Kafka logs on disk is 200GB right now. Isn'tlog.retention.size supposed to limit it to 100GB? Am I missing something? The docs are notreally clear, especially when it comes to distinguishing between a "log file" and a "log segment file".I have disk monitoring. But like anything else in software, evenmonitoring can fail. Via configuration, I'd like to make surethat Kafkadoes not write more than the available disk space. Or something like log4j, where I can set a max number of log files and the max sizeper file,which essentially allows me to set a max aggregate size limit across alllogs. Thanks,-Vinh