Luke Chen created KAFKA-15414: --------------------------------- Summary: remote logs get deleted after partition reassignment Key: KAFKA-15414 URL: https://issues.apache.org/jira/browse/KAFKA-15414 Project: Kafka Issue Type: Bug Reporter: Luke Chen Attachments: image-2023-08-29-11-12-58-875.png
it seems I'm reaching that codepath when running reassignments on my cluster and segment are deleted from remote store despite a huge retention (topic created a few hours ago with 1000h retention). It seems to happen consistently on some partitions when reassigning but not all partitions. My test: I have a test topic with 30 partition configured with 1000h global retention and 2 minutes local retention I have a load tester producing to all partitions evenly I have consumer load tester consuming that topic I regularly reset offsets to earliest on my consumer to test backfilling from tiered storage. My consumer was catching up consuming the backlog and I wanted to upscale my cluster to speed up recovery: I upscaled my cluster from 3 to 12 brokers and reassigned my test topic to all available brokers to have an even leader/follower count per broker. When I triggered the reassignment, the consumer lag dropped on some of my topic partitions: !image-2023-08-29-11-12-58-875.png|width=800,height=79! Screenshot 2023-08-28 at 20 57 09 Later I tried to reassign back my topic to 3 brokers and the issue happened again. Both times in my logs, I've seen a bunch of logs like: [RemoteLogManager=10005 partition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17] Deleted remote log segment RemoteLogSegmentId {topicIdPartition=uR3O_hk3QRqsn4mPXGFoOw:loadtest11-17, id=Mk0chBQrTyKETTawIulQog} due to leader epoch cache truncation. Current earliest epoch: EpochEntry(epoch=14, startOffset=46776780), segmentEndOffset: 46437796 and segmentEpochs: [10] Looking at my s3 bucket. The segments prior to my reassignment have been indeed deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)