[ 
https://issues.apache.org/jira/browse/KUDU-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adar Dembo reopened KUDU-1853:
------------------------------

With the revert of 72541b47eb55b2df4eab5d6050f517476ed6d370 (see KUDU-1968), 
this bug is no longer fixed. Well, it's still fixed for 1.3.0 (which has 
already been released), but not for any subsequent release.

> Error during tablet copy may orphan a bunch of stuff
> ----------------------------------------------------
>
>                 Key: KUDU-1853
>                 URL: https://issues.apache.org/jira/browse/KUDU-1853
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet, tserver
>    Affects Versions: 1.2.0
>            Reporter: Adar Dembo
>            Assignee: Mike Percy
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> Currently, a failure during tablet copy may leave behind a number of 
> different things:
> # Downloaded superblock (if the failure falls after TabletCopyClient::Start())
> # Downloaded data blocks (if the failure falls during 
> TabletCopyClient::FetchAll())
> # Downloaded WAL segments (if the failure falls during 
> TabletCopyClient::FetchAll())
> # Downloaded cmeta file (if the failure falls during 
> TabletCopyClient::Finish())
> The next time the tserver starts, it'll see that this tablet's state is still 
> TABLET_DATA_COPYING and will tombstone it. That takes care of #1, #3, and #4 
> (well, it leaves the cmeta file behind as the tombstone, but that's 
> intentional).
> Unfortunately, all data blocks are orphaned, because the on-disk superblock 
> has no record of the new blocks, and so they aren't deleted.
> We're already tracking a general purpose GC mechanism for data blocks in 
> KUDU-829, but I think this separate JIRA for describing the problem with 
> tablet copy is useful, if only as a reference for users.
> Separately, it may be worth addressing these issues for failures that don't 
> result in tserver crashes, such as intermittent network outages between 
> tservers. A long lived tserver won't GC for some time, and it'd be nice to 
> reclaim the disk space used by these orphaned objects in the interim, not to 
> mention that implementing this kind of "GC" for data blocks is a lot easier 
> than a general purpose GC.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to