You cannot delete arbitrary data, however, it's possible to send a "truncate request" to brokers, to delete data before the retention time is reached:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%3A+Add+deleteRecordsBefore%28%29+API+in+AdminClient There is `AdminClient#deleteRecords(...)` API to do so. -Matthias On 8/21/19 9:09 PM, Murilo Tavares wrote: > Thanks Matthias for the prompt response. > Now just for curiosity, how does that work? I thought it was not possible > to easily delete topic data... > > > On Wed, Aug 21, 2019 at 4:51 PM Matthias J. Sax <matth...@confluent.io> > wrote: > >> No need to worry about this. >> >> Kafka Streams used "purge data" calls, to actively delete data from >> those topics after the records are processed. Hence, those topics won't >> grow unbounded but are "truncated" on a regular basis. >> >> >> -Matthias >> >> On 8/21/19 11:38 AM, Murilo Tavares wrote: >>> Hi >>> I have a complex KafkaStreams topology, where I have a bunch of KTables >>> that I regroup (rekeying) and aggregate so I can join them. >>> I've noticed that the "-repartition" topics created by the groupBy >>> operations have a very long retention by default (Long.MAX_VALUE). >>> I'm a bit concerned about the size of these topics, as they will retain >>> data forever. I wonder why are they so long, and what would be the impact >>> of reducing this retention? >>> Thanks >>> Murilo >>> >> >> >
signature.asc
Description: OpenPGP digital signature