[ 
https://issues.apache.org/jira/browse/KAFKA-17428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Federico Valeri updated KAFKA-17428:
------------------------------------
    Summary: Add retry mechanism for cleaning up dangling remote segments  
(was: Remote segments stay in COPY_SEGMENT_STARTED state after RLMCopyTask 
fails to upload)

> Add retry mechanism for cleaning up dangling remote segments
> ------------------------------------------------------------
>
>                 Key: KAFKA-17428
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17428
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Luke Chen
>            Assignee: Federico Valeri
>            Priority: Major
>
> Currently, we will delete failed uploaded segment and Custom metadata size 
> exceeded segments in copyLogSegment in RLMCopyTask. But after deletion, these 
> segment states are still in COPY_SEGMENT_STARTED. That "might" cause 
> unexpected issues in the future. We'd better to move the state from 
> {{COPY_SEGMENT_STARTED}} -> {{DELETE_SEGMENT_STARTED}} -> 
> {{DELETE_SEGMENT_FINISHED}}
>  
> updated:
> I thought about this when I first had a look at it and one thing that 
> bothered me is that {{DELETE_SEGMENT_STARTED}} means to me that we're now in 
> a state where we attempt deletion. However if the remote store is down and we 
> fail to copy and delete we will leave that segment in 
> {{DELETE_SEGMENT_STARTED}} and not attempt to delete it till the segment 
> itself breaches retention.ms/bytes.
> We can probably just make it clearer but that was my thought at the time.
> So, maybe when in deletion loop, we can add {{DELETE_SEGMENT_STARTED}} 
> segments into deletion directly, but that also needs to consider the 
> retention size calculation.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to