Re: Spread log segment deletion over a couple hours

2018-05-03 Thread Vincent Rischmann
Hi Johnathan. Yes I decreased the retention on all topics simultaneously. I realized my mistake later when I saw the cluster overloaded :) I wasn't 100% sure so I looked it up, but it looks to me like log.cleaner.threads and log.cleaner.io.max.bytes.per.second only apply when a topic is using

Re: Spread log segment deletion over a couple hours

2018-05-02 Thread Jonathan Bethune
Howdy Vincent. Sounds like a painful situation! I have experienced similar drama with Kafka so maybe I can offer some advice. You said you decreased the retention time on 4 topics. I wonder, was this done on all 4 topics at the same time? Depending on broker and partition config, that can be ver

Spread log segment deletion over a couple hours

2018-05-02 Thread Vincent Rischmann
Hi, I'm wondering if there is a way to tell Kafka to spread the log file deletion when decreasing the retention time of a topic, and if not, if it would make sense. I'm asking because this afternoon, after decreasing the retention time from 2 months to 1 month on 4 of my topics, the whole cluster

Re: Log segment deletion

2018-01-30 Thread Guozhang Wang
Hi Martin, That is a good point. In fact in the coming release we have made such repartition topics really "transient" by periodically purging it with the embedded admin client, so we can actually set its retention to -1: https://cwiki.apache.org/confluence/display/KAFKA/KIP-220%3A+Add+AdminClien

Re: Log segment deletion

2018-01-30 Thread Martin Kleppmann
Hi Guozhang, Thanks very much for your reply. I am inclined to consider this a bug, since Kafka Streams in the default configuration is likely to run into this problem while reprocessing old messages, and in most cases the problem wouldn't be noticed (since there is no error -- the job just pro

Re: Log segment deletion

2018-01-29 Thread Guozhang Wang
Hello Martin, What you've observed is correct. More generally speaking, for various broker-side operations that based on record timestamps and treating them as wall-clock time, there is a mismatch between the stream records' timestamp which is basically "event time", against the broker's system wa

Re: Log segment deletion

2018-01-29 Thread Martin Kleppmann
Follow-up: I think we figured out what was happening. Setting the broker config log.message.timestamp.type=LogAppendTime (instead of the default value CreateTime) stopped the messages disappearing. The messages in the Streams app's input topic are older than the 24 hours default retention perio

Log segment deletion

2018-01-29 Thread Martin Kleppmann
Hi all, We are debugging an issue with a Kafka Streams application that is producing incorrect output. The application is a simple group-by on a key, and then count. As expected, the application creates a repartitioning topic for the group-by stage. The problem appears to be that messages are g