Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-25 Thread Josh Elser
Coming full circle on the "makes me worry" comment I left: I asked the question in work channels about my concern and SteveL did confirm that the "S3 strong consistency" feature does apply generally to CRUD operations. I believe this means, if we assume there is exactly one RegionServer

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Duo Zhang
Just go ahead Josh, I haven't started to write the design doc yet. Thank you for your help! Josh Elser 于2021年5月25日周二 上午1:45写道: > Without completely opening Pandora's box, I will say we definitely have > multiple ways we can solve the metadata management for tracking (e.g. in > meta, in some

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Duo Zhang
Oh, sorry. Missed that. I think the key point here is we should not have partial storefiles in the data directory if we want to downgrade. This is possible by setting the flag to false first to prevent new partial storefiles, and then use a HBCK command to remove all the partial storefiles? And

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Josh Elser
Without completely opening Pandora's box, I will say we definitely have multiple ways we can solve the metadata management for tracking (e.g. in meta, in some other system table, in some other system, in a per-store file). Each of them have pro's and con's, and each of them has "favor" as to

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Josh Elser
I got pulled into a call with some folks from S3 at the last minute late week. There was a comment made in passing about reading the latest, written version of a file. At the moment, I didn't want to digress into that because of immutable HFiles. However, if we're tracking files-per-store in

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Andrew Purtell
The important detail is first there is an upgrade to a version that can support the new store layout across the whole cluster, so there will be no rolling upgrade related issues when the new layout is enabled. The new layout can be enabled with a new site config, a shell command to set a schema

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Andrew Purtell
> I do not think it should be a table level config. It should be a cluster level config. We only have one FileSystem so it is useless to let different tables have different ways to store hfile list. The perspective that claims this "useless" is a limited perspective. In our clusters, we value

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-24 Thread Andrew Purtell
> And for downgrading, usually we do not support downgrading from a major version upgrading, so it is not a big problem. You missed an earlier comment from me. Our team requires this to be released in a branch-2 version or we can't use it. Therefore I am not in favor of any solution that

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-23 Thread Duo Zhang
I do not think it should be a table level config. It should be a cluster level config. We only have one FileSystem so it is useless to let different tables have different ways to store hfile list. But I think the general approach is fine. We could introduce a config for whether to enable 'write

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-22 Thread Andrew Purtell
Put a check in the code whether hfilelist mode or original store layout is in use and handles both cases. Then, to upgrade: 1. First, perform a rolling upgrade to $NEW_VERSION . 2. Once upgraded to $NEW_VERSION execute an alter table command that enables hfilelist mode. This will cause all

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-22 Thread Duo Zhang
I could put up a simple design doc for this. But there is still a problem, about how to do rolling upgrading. After we changed the behavior, the region server will write partial store files directly into the data directory. For new region servers, this is not a problem, as we will read the

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-21 Thread Stack
HBASE-24749 design and implementation had acknowledged compromises on review: e.g. adding a new 'system table' to hold store files. I'd suggest the design and implementation need a revisit before we go forward; for instance, factoring for systems other than s3 as suggested above (I like the Duo

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-21 Thread Duo Zhang
So maybe we could introduce a .hfilelist directory, and put the hflielist files under this directory, so we do not need to list all the files under the region directory. And considering the possible implementation for typical object storages, listing the last directory on the whole path will be

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-21 Thread Andrew Purtell
> On May 21, 2021, at 6:07 PM, 张铎 wrote: > > Since we just make use of the general FileSystem API to do listing, is it > possible to make use of ' bucket index listing'? Yes, those words mean the same thing. > > Andrew Purtell 于2021年5月22日周六 上午6:34写道: > >> >> >>> On May 20, 2021, at

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-21 Thread Duo Zhang
Since we just make use of the general FileSystem API to do listing, is it possible to make use of ' bucket index listing'? Andrew Purtell 于2021年5月22日周六 上午6:34写道: > > > > On May 20, 2021, at 4:00 AM, Wellington Chevreuil < > wellington.chevre...@gmail.com> wrote: > > > >  > >> > >> > >> IMO it

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-21 Thread Andrew Purtell
> On May 20, 2021, at 4:00 AM, Wellington Chevreuil > wrote: > >  >> >> >> IMO it should be a file per store. >> Per region is not suitable here as compaction is per store. >> Per file means we still need to list all the files. And usually, after >> compaction, we need to do an atomic

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-20 Thread Wellington Chevreuil
> > IMO it should be a file per store. > Per region is not suitable here as compaction is per store. > Per file means we still need to list all the files. And usually, after > compaction, we need to do an atomic operation to remove several old files > and add a new file, or even several files for

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Duo Zhang
IIRC S3 is the only object storage which does not guarantee read-after-write consistency in the past... This is the quick result after googling AWS [1] > Amazon S3 delivers strong read-after-write consistency automatically for > all applications Azure[2] > Azure Storage was designed to

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Duo Zhang
Oh, just saw your last comment. IMO it should be a file per store. Per region is not suitable here as compaction is per store. Per file means we still need to list all the files. And usually, after compaction, we need to do an atomic operation to remove several old files and add a new file, or

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Wellington Chevreuil
I like the idea of tracking via files in the store. We might even do a single "hfile.commit" file for each "hfile" that got committed and has to be loaded. Once the store is opening, any hfile that doesn't have a corresponding .commit file should not be loaded, then. That discards the need for

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Andrew Purtell
Consistent read what you wrote bucket metadata operations are standard now for S3, Google’s GCS, and anyone who uses Ceph via its radios-gw. I think it will be table stakes for cloud object storage. Although clients will all see the latest metadata state for an object updated in an atomic way,

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Nick Dimiduk
On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) wrote: > What about just storing the hfile list in a file? Since now S3 has strong > consistency, we could safely overwrite a file then I think? > My concern is about portability. S3 isn't the only blob store in town, and consistent

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Duo Zhang
What about just storing the hfile list in a file? Since now S3 has strong consistency, we could safely overwrite a file then I think? And since the hfile list file will be very small, renaming will not be a big problem. We could write the hfile list to a file called 'hfile.list.tmp', and then

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-19 Thread Wellington Chevreuil
Thank you, Andrew and Duo, Talking internally with Josh Elser, initial idea was to rebase the feature branch with master (in order to catch with latest commits), then focus on work to have a minimal functioning hbase, in other words, together with the already committed work from HBASE-25391, make

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-18 Thread Duo Zhang
S3 now supports strong consistency, and I heard that they are also implementing atomic renaming currently, so maybe that's one of the reasons why the development is silent now... For me, I also think deploying hbase on cloud storage is the future, so I would also like to participate here. But I

Re: [DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-18 Thread Andrew Purtell
Wellington (and et. al), S3 is also an important piece of our future production plans. Unfortunately, we were unable to assist much with last year's work, on account of being sidetracked by more immediate concerns. Fortunately, this renewed interest is timely in that we have an HBase 2 project

[DISCUSS] Implement and release HBASE-24749 (an hfile tracker that allows for avoiding renames)

2021-05-18 Thread Wellington Chevreuil
Greetings everyone, HBASE-24749 has been proposed almost a year ago, introducing a new StoreFile tracker as a way to allow for any hbase hfile modifications to be safely completed without needing a file system rename. This seems pretty relevant for deployments over S3 file systems, where rename