This is an automated email from the ASF dual-hosted git repository. elserj pushed a commit to branch HBASE-26067 in repository https://gitbox.apache.org/repos/asf/hbase.git
The following commit(s) were added to refs/heads/HBASE-26067 by this push: new 1dbdefc HBASE-26265 Update ref guide to mention the new store file tracker im… (#3942) 1dbdefc is described below commit 1dbdefc7c165e13de1272d9f03a8ef78980d36e2 Author: Wellington Ramos Chevreuil <wchevre...@apache.org> AuthorDate: Thu Dec 16 21:07:38 2021 +0000 HBASE-26265 Update ref guide to mention the new store file tracker im… (#3942) --- .../asciidoc/_chapters/store_file_tracking.adoc | 145 +++++++++++++++++++++ src/main/asciidoc/book.adoc | 1 + 2 files changed, 146 insertions(+) diff --git a/src/main/asciidoc/_chapters/store_file_tracking.adoc b/src/main/asciidoc/_chapters/store_file_tracking.adoc new file mode 100644 index 0000000..74d802f --- /dev/null +++ b/src/main/asciidoc/_chapters/store_file_tracking.adoc @@ -0,0 +1,145 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[storefiletracking]] += Store File Tracking +:doctype: book +:numbered: +:toc: left +:icons: font +:experimental: + +== Overview + +This feature introduces an abstraction layer to track store files still used/needed by store +engines, allowing for plugging different approaches of identifying store +files required by the given store. + +Historically, HBase internals have relied on creating hfiles on temporary directories first, renaming +those files to the actual store directory at operation commit time. That's a simple and convenient +way to separate transient from already finalised files that are ready to serve client reads with data. +This approach works well with strong consistent file systems, but with the popularity of less consistent +file systems, mainly Object Store which can be used like file systems, dependency on atomic rename operations starts to introduce +performance penalties. The Amazon S3 Object Store, in particular, has been the most affected deployment, +due to its lack of atomic renames. The HBase community temporarily bypassed this problem by building a distributed locking layer called HBOSS, +to guarantee atomicity of operations against S3. + +With *Store File Tracking*, decision on where to originally create new hfiles and how to proceed upon +commit is delegated to the specific Store File Tracking implementation. +The implementation can be set at the HBase service leve in *hbase-site.xml* or at the +Table or Column Family via the TableDescriptor configuration. + +NOTE: When the store file tracking implementation is specified in *hbase_site.xml*, this configuration is also propagated into a tables configuration +at table creation time. This is to avoid dangerous configuration mismatches between processes, which +could potentially lead to data loss. + +== Available Implementations + +Store File Tracking initial version provides three builtin implementations: + +* DEFAULT +* FILE +* MIGRATION + +### DEFAULT + +As per the name, this is the Store File Tracking implementation used by default when no explicit +configuration has been defined. The DEFAULT tracker implements the standard approach using temporary +directories and renames. This is how all previous (implicit) implementation that HBase used to track store files. + +### FILE + +A file tracker implementation that creates new files straight in the store directory, avoiding the +need for rename operations. It keeps a list of committed hfiles in memory, backed by meta files, in +each store directory. Whenever a new hfile is committed, the list of _tracked files_ in the given +store is updated and a new meta file is written with this list contents, discarding the previous +meta file now containing an out dated list. + +### MIGRATION + +A special implementation to be used when swapping between Store File Tracking implementations on +pre-existing tables that already contain data, and therefore, files being tracked under an specific +logic. + +== Usage + +For fresh deployments that don't yet contain any user data, *FILE* implementation can be just set as +value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* configuration, prior +to the first hbase start. Omitting this property sets the *DEFAULT* implementation. + +For clusters with data that are upgraded to a version of HBase containing the store file tracking +feature, the Store File Tracking implementation can only be changed with the *MIGRATION* +implementation, so that the _new tracker_ can safely build its list of tracked files based on the +list of the _current tracker_. + +NOTE: MIGRATION tracker should NOT be set at global configuration. To use it, follow below section +about setting Store File Tacking at Table or Column Family configuration. + + +### Configuring for Table or Column Family + +Setting Store File Tracking configuration globally may not always be possible or desired, for example, +in the case of upgraded clusters with pre-existing user data. +Store File Tracking can be set at Table or Column Family level configuration. +For example, to specify *FILE* implementation in the table configuration at table creation time, +the following should be applied: + +---- +create 'my-table', 'f1', 'f2', {CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}} +---- + +To define *FILE* for an specific Column Family: + +---- +create 'my-table', {NAME=> '1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'}} +---- + +### Switching trackers at Table or Column Family + +A very common scenario is to set Store File Tracking on pre-existing HBase deployments that have +been upgraded to a version that supports this feature. To apply the FILE tracker, tables effectively +need to be migrated from the DEFAULT tracker to the FILE tracker. As explained previously, such +process requires the usage of the special MIGRATION tracker implementation, which can only be +specified at table or Column Family level. + +For example, to switch _tracker_ from *DEFAULT* to *FILE* in a table configuration: + +---- +alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION', +'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT', +'hbase.store.file-tracker.migration.dst.impl' => 'FILE'} +---- + +To apply similar switch at column family level configuration: + +---- +alter 'my-table', {NAME => 'f1', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'MIGRATION', +'hbase.store.file-tracker.migration.src.impl' => 'DEFAULT', +'hbase.store.file-tracker.migration.dst.impl' => 'FILE'}} +---- + +Once all table regions have been onlined again, don't forget to disable MIGRATION, by now setting +*hbase.store.file-tracker.migration.dst.impl* value as the *hbase.store.file-tracker.impl*. In the above +example, that would be as follows: + +---- +alter 'my-table', CONFIGURATION => {'hbase.store.file-tracker.impl' => 'FILE'} +---- diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc index a622786..b8c648e 100644 --- a/src/main/asciidoc/book.adoc +++ b/src/main/asciidoc/book.adoc @@ -89,6 +89,7 @@ include::_chapters/zookeeper.adoc[] include::_chapters/community.adoc[] include::_chapters/hbtop.adoc[] include::_chapters/tracing.adoc[] +include::_chapters/store_file_tracking.adoc[] = Appendix