Coming full circle on the "makes me worry" comment I left:
I asked the question in work channels about my concern and SteveL did
confirm that the "S3 strong consistency" feature does apply generally to
CRUD operations.
I believe this means, if we assume there is exactly one RegionServer
Just go ahead Josh, I haven't started to write the design doc yet.
Thank you for your help!
Josh Elser 于2021年5月25日周二 上午1:45写道:
> Without completely opening Pandora's box, I will say we definitely have
> multiple ways we can solve the metadata management for tracking (e.g. in
> meta, in some
Oh, sorry. Missed that.
I think the key point here is we should not have partial storefiles in the
data directory if we want to downgrade. This is possible by setting the
flag to false first to prevent new partial storefiles, and then use a HBCK
command to remove all the partial storefiles?
And
Without completely opening Pandora's box, I will say we definitely have
multiple ways we can solve the metadata management for tracking (e.g. in
meta, in some other system table, in some other system, in a per-store
file). Each of them have pro's and con's, and each of them has "favor"
as to
I got pulled into a call with some folks from S3 at the last minute late
week.
There was a comment made in passing about reading the latest, written
version of a file. At the moment, I didn't want to digress into that
because of immutable HFiles. However, if we're tracking files-per-store
in
The important detail is first there is an upgrade to a version that can
support the new store layout across the whole cluster, so there will be no
rolling upgrade related issues when the new layout is enabled.
The new layout can be enabled with a new site config, a shell command to
set a schema
> I do not think it should be a table level config. It should be a cluster
level config. We only have one FileSystem so it is useless to let different
tables have different ways to store hfile list.
The perspective that claims this "useless" is a limited perspective.
In our clusters, we value
> And for downgrading, usually we do not support downgrading from a major
version upgrading, so it is not a big problem.
You missed an earlier comment from me.
Our team requires this to be released in a branch-2 version or we can't use
it. Therefore I am not in favor of any solution that
I do not think it should be a table level config. It should be a cluster
level config. We only have one FileSystem so it is useless to let different
tables have different ways to store hfile list.
But I think the general approach is fine. We could introduce a config for
whether to enable 'write
Put a check in the code whether hfilelist mode or original store layout is in
use and handles both cases. Then, to upgrade:
1. First, perform a rolling upgrade to $NEW_VERSION .
2. Once upgraded to $NEW_VERSION execute an alter table command that enables
hfilelist mode. This will cause all
I could put up a simple design doc for this.
But there is still a problem, about how to do rolling upgrading.
After we changed the behavior, the region server will write partial store
files directly into the data directory. For new region servers, this is not
a problem, as we will read the
HBASE-24749 design and implementation had acknowledged compromises on
review: e.g. adding a new 'system table' to hold store files. I'd suggest
the design and implementation need a revisit before we go forward; for
instance, factoring for systems other than s3 as suggested above (I like
the Duo
So maybe we could introduce a .hfilelist directory, and put the hflielist
files under this directory, so we do not need to list all the files under
the region directory.
And considering the possible implementation for typical object storages,
listing the last directory on the whole path will be
> On May 21, 2021, at 6:07 PM, 张铎 wrote:
>
> Since we just make use of the general FileSystem API to do listing, is it
> possible to make use of ' bucket index listing'?
Yes, those words mean the same thing.
>
> Andrew Purtell 于2021年5月22日周六 上午6:34写道:
>
>>
>>
>>> On May 20, 2021, at
Since we just make use of the general FileSystem API to do listing, is it
possible to make use of ' bucket index listing'?
Andrew Purtell 于2021年5月22日周六 上午6:34写道:
>
>
> > On May 20, 2021, at 4:00 AM, Wellington Chevreuil <
> wellington.chevre...@gmail.com> wrote:
> >
> >
> >>
> >>
> >> IMO it
> On May 20, 2021, at 4:00 AM, Wellington Chevreuil
> wrote:
>
>
>>
>>
>> IMO it should be a file per store.
>> Per region is not suitable here as compaction is per store.
>> Per file means we still need to list all the files. And usually, after
>> compaction, we need to do an atomic
>
> IMO it should be a file per store.
> Per region is not suitable here as compaction is per store.
> Per file means we still need to list all the files. And usually, after
> compaction, we need to do an atomic operation to remove several old files
> and add a new file, or even several files for
IIRC S3 is the only object storage which does not guarantee
read-after-write consistency in the past...
This is the quick result after googling
AWS [1]
> Amazon S3 delivers strong read-after-write consistency automatically for
> all applications
Azure[2]
> Azure Storage was designed to
Oh, just saw your last comment.
IMO it should be a file per store.
Per region is not suitable here as compaction is per store.
Per file means we still need to list all the files. And usually, after
compaction, we need to do an atomic operation to remove several old files
and add a new file, or
I like the idea of tracking via files in the store. We might even do a
single "hfile.commit" file for each "hfile" that got committed and has to
be loaded. Once the store is opening, any hfile that doesn't have a
corresponding .commit file should not be loaded, then. That discards the
need for
Consistent read what you wrote bucket metadata operations are standard now for
S3, Google’s GCS, and anyone who uses Ceph via its radios-gw. I think it will
be table stakes for cloud object storage. Although clients will all see the
latest metadata state for an object updated in an atomic way,
On Wed, May 19, 2021 at 8:19 AM 张铎(Duo Zhang) wrote:
> What about just storing the hfile list in a file? Since now S3 has strong
> consistency, we could safely overwrite a file then I think?
>
My concern is about portability. S3 isn't the only blob store in town, and
consistent
What about just storing the hfile list in a file? Since now S3 has strong
consistency, we could safely overwrite a file then I think?
And since the hfile list file will be very small, renaming will not be a
big problem.
We could write the hfile list to a file called 'hfile.list.tmp', and then
Thank you, Andrew and Duo,
Talking internally with Josh Elser, initial idea was to rebase the feature
branch with master (in order to catch with latest commits), then focus on
work to have a minimal functioning hbase, in other words, together with the
already committed work from HBASE-25391, make
S3 now supports strong consistency, and I heard that they are also
implementing atomic renaming currently, so maybe that's one of the reasons
why the development is silent now...
For me, I also think deploying hbase on cloud storage is the future, so I
would also like to participate here.
But I
Wellington (and et. al),
S3 is also an important piece of our future production plans.
Unfortunately, we were unable to assist much with last year's work, on
account of being sidetracked by more immediate concerns. Fortunately, this
renewed interest is timely in that we have an HBase 2 project
Greetings everyone,
HBASE-24749 has been proposed almost a year ago, introducing a new
StoreFile tracker as a way to allow for any hbase hfile modifications to be
safely completed without needing a file system rename. This seems pretty
relevant for deployments over S3 file systems, where rename
27 matches
Mail list logo