[29/33] hbase git commit: Tying up loose ends

misty Mon, 12 Jan 2015 23:50:49 -0800
http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc 
b/src/main/asciidoc/_chapters/datamodel.adoc
new file mode 100644
index 0000000..8bca69b
--- /dev/null
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -0,0 +1,585 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[datamodel]]
+= Data Model
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+In HBase, data is stored in tables, which have rows and columns.
+This is a terminology overlap with relational databases (RDBMSs), but this is 
not a helpful analogy.
+Instead, it can be helpful to think of an HBase table as a multi-dimensional 
map.
+
+.HBase Data Model TerminologyTable::
+  An HBase table consists of multiple rows.
+
+Row::
+  A row in HBase consists of a row key and one or more columns with values 
associated with them.
+  Rows are sorted alphabetically by the row key as they are stored.
+  For this reason, the design of the row key is very important.
+  The goal is to store data in such a way that related rows are near each 
other.
+  A common row key pattern is a website domain.
+  If your row keys are domains, you should probably store them in reverse 
(org.apache.www, org.apache.mail, org.apache.jira). This way, all of the Apache 
domains are near each other in the table, rather than being spread out based on 
the first letter of the subdomain.
+
+Column::
+  A column in HBase consists of a column family and a column qualifier, which 
are delimited by a [literal]+:+ (colon) character.
+
+Column Family::
+  Column families physically colocate a set of columns and their values, often 
for performance reasons.
+  Each column family has a set of storage properties, such as whether its 
values should be cached in memory, how its data is compressed or its row keys 
are encoded, and others.
+  Each row in a table has the same column families, though a given row might 
not store anything in a given column family.
+
+Column Qualifier::
+  A column qualifier is added to a column family to provide the index for a 
given piece of data.
+  Given a column family [literal]+content+, a column qualifier might be 
[literal]+content:html+, and another might be [literal]+content:pdf+.
+  Though column families are fixed at table creation, column qualifiers are 
mutable and may differ greatly between rows.
+
+Cell::
+  A cell is a combination of row, column family, and column qualifier, and 
contains a value and a timestamp, which represents the value's version.
+
+Timestamp::
+  A timestamp is written alongside each value, and is the identifier for a 
given version of a value.
+  By default, the timestamp represents the time on the RegionServer when the 
data was written, but you can specify a different timestamp value when you put 
data into the cell.
+
+[[conceptual.view]]
+== Conceptual View
+
+You can read a very understandable explanation of the HBase data model in the 
blog post 
link:http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable[Understanding
+          HBase and BigTable] by Jim R.
+Wilson.
+Another good explanation is available in the PDF 
link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction
+          to Basic Schema Design] by Amandeep Khurana.
+It may help to read different perspectives to get a solid understanding of 
HBase schema design.
+The linked articles cover the same ground as the information in this section.
+
+The following example is a slightly modified form of the one on page 2 of the 
link:http://research.google.com/archive/bigtable.html[BigTable] paper.
+There is a table called [var]+webtable+ that contains two rows 
([literal]+com.cnn.www+          and [literal]+com.example.www+), three column 
families named [var]+contents+, [var]+anchor+, and [var]+people+.
+In this example, for the first row ([literal]+com.cnn.www+),  [var]+anchor+ 
contains two columns ([var]+anchor:cssnsi.com+, [var]+anchor:my.look.ca+) and 
[var]+contents+ contains one column ([var]+contents:html+). This example 
contains 5 versions of the row with the row key [literal]+com.cnn.www+, and one 
version of the row with the row key [literal]+com.example.www+.
+The [var]+contents:html+ column qualifier contains the entire HTML of a given 
website.
+Qualifiers of the [var]+anchor+ column family each contain the external site 
which links to the site represented by the row, along with the text it used in 
the anchor of its link.
+The [var]+people+ column family represents people associated with the site. 
+
+.Column Names
+[NOTE]
+====
+By convention, a column name is made of its column family prefix and a 
_qualifier_.
+For example, the column _contents:html_ is made up of the column family 
[var]+contents+ and the [var]+html+ qualifier.
+The colon character ([literal]+:+) delimits the column family from the column 
family _qualifier_. 
+====
+
+.Table [var]+webtable+
+[cols="1,1,1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| ColumnFamily contents
+| ColumnFamily anchor
+| ColumnFamily people
+| anchor:cnnsi.com = "CNN"
+
+| anchor:my.look.ca = "CNN.com"
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+|===
+
+Cells in this table that appear to be empty do not take space, or in fact 
exist, in HBase.
+This is what makes HBase "sparse." A tabular view is not the only possible way 
to look at data in HBase, or even the most accurate.
+The following represents the same information as a multi-dimensional map.
+This is only a mock-up for illustrative purposes and may not be strictly 
accurate.
+
+[source]
+----
+
+{
+       "com.cnn.www": {
+               contents: {
+                       t6: contents:html: "<html>..."
+                       t5: contents:html: "<html>..."
+                       t3: contents:html: "<html>..."
+               }
+               anchor: {
+                       t9: anchor:cnnsi.com = "CNN"
+                       t8: anchor:my.look.ca = "CNN.com"
+               }
+               people: {}
+       }
+       "com.example.www": {
+               contents: {
+                       t5: contents:html: "<html>..."
+               }
+               anchor: {}
+               people: {
+                       t5: people:author: "John Doe"
+               }
+       }
+}
+----
+
+[[physical.view]]
+== Physical View
+
+Although at a conceptual level tables may be viewed as a sparse set of rows, 
they are physically stored by column family.
+A new column qualifier (column_family:column_qualifier) can be added to an 
existing column family at any time.
+
+.ColumnFamily [var]+anchor+
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| Column Family anchor
+| anchor:cnnsi.com = "CNN"
+
+| anchor:my.look.ca = "CNN.com"
+|===
+
+.ColumnFamily [var]+contents+
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| ColumnFamily "contents:"
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+|===
+
+The empty cells shown in the conceptual view are not stored at all.
+Thus a request for the value of the [var]+contents:html+ column at time stamp 
[literal]+t8+ would return no value.
+Similarly, a request for an [var]+anchor:my.look.ca+ value at time stamp 
[literal]+t9+ would return no value.
+However, if no timestamp is supplied, the most recent value for a particular 
column would be returned.
+Given multiple versions, the most recent is also the first one found,  since 
timestamps are stored in descending order.
+Thus a request for the values of all columns in the row [var]+com.cnn.www+ if 
no timestamp is specified would be: the value of [var]+contents:html+ from 
timestamp [literal]+t6+, the value of [var]+anchor:cnnsi.com+ from timestamp 
[literal]+t9+, the value of [var]+anchor:my.look.ca+ from timestamp 
[literal]+t8+. 
+
+For more information about the internals of how Apache HBase stores data, see 
<<regions.arch,regions.arch>>. 
+
+== Namespace
+
+A namespace is a logical grouping of tables analogous to a database in 
relation database systems.
+This abstraction lays the groundwork for upcoming multi-tenancy related 
features: 
+
+* Quota Management (HBASE-8410) - Restrict the amount of resources (ie 
regions, tables) a namespace can consume.
+* Namespace Security Administration (HBASE-9206) - provide another level of 
security administration for tenants.
+* Region server groups (HBASE-6721) - A namespace/table can be pinned onto a 
subset of regionservers thus guaranteeing a course level of isolation.      
+
+[[namespace_creation]]
+=== Namespace management
+
+A namespace can be created, removed or altered.
+Namespace membership is determined during table creation by specifying a 
fully-qualified table name of the form:
+
+[source,xml]
+----
+<table namespace>:<table qualifier>
+----
+
+.Examples
+====
+[source,bourne]
+----
+
+#Create a namespace
+create_namespace 'my_ns'
+----
+
+[source,bourne]
+----
+
+#create my_table in my_ns namespace
+create 'my_ns:my_table', 'fam'
+----
+
+[source,bourne]
+----
+
+#drop namespace
+drop_namespace 'my_ns'
+----
+
+[source,bourne]
+----
+
+#alter namespace
+alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
+----
+====
+
+[[namespace_special]]
+=== Predefined namespaces
+
+There are two predefined special namespaces: 
+
+* hbase - system namespace, used to contain hbase internal tables
+* default - tables with no explicit specified namespace will automatically 
fall into this namespace.
+
+.Examples
+====
+[source,bourne]
+----
+
+#namespace=foo and table qualifier=bar
+create 'foo:bar', 'fam'
+
+#namespace=default and table qualifier=bar
+create 'bar', 'fam'
+----
+====
+
+== Table
+
+Tables are declared up front at schema definition time. 
+
+== Row
+
+Row keys are uninterrpreted bytes.
+Rows are lexicographically sorted with the lowest order appearing first in a 
table.
+The empty byte array is used to denote both the start and end of a tables' 
namespace.
+
+[[columnfamily]]
+== Column Family
+
+Columns in Apache HBase are grouped into _column families_.
+All column members of a column family have the same prefix.
+For example, the columns _courses:history_ and _courses:math_ are both members 
of the _courses_ column family.
+The colon character ([literal]+:+) delimits the column family from the 
+column family qualifier.
+The column family prefix must be composed of _printable_ characters.
+The qualifying tail, the column family _qualifier_, can be made of any 
arbitrary bytes.
+Column families must be declared up front at schema definition time whereas 
columns do not need to be defined at schema time but can be conjured on the fly 
while the table is up an running.
+
+Physically, all column family members are stored together on the filesystem.
+Because tunings and storage specifications are done at the column family 
level, it is advised that all column family members have the same general 
access pattern and size characteristics.
+
+== Cells
+
+A _{row, column, version}_ tuple exactly specifies a [literal]+cell+ in HBase.
+Cell content is uninterrpreted bytes
+
+== Data Model Operations
+
+The four primary data model operations are Get, Put, Scan, and Delete.
+Operations are applied via 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]
        instances. 
+
+=== Get
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get]
          returns attributes for a specified row.
+Gets are executed via 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[
+            Table.get]. 
+
+=== Put
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put]
          either adds new rows to a table (if the key is new) or can update 
existing rows (if the key already exists). Puts are executed via 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[
+            Table.put] (writeBuffer) or 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,
 java.lang.Object[])[
+            Table.batch] (non-writeBuffer). 
+
+[[scan]]
+=== Scans
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan]
          allow iteration over multiple rows for specified attributes. 
+
+The following is an example of a Scan on a Table instance.
+Assume that a table is populated with rows with keys "row1", "row2", "row3", 
and then another set of rows with the keys "abc1", "abc2", and "abc3". The 
following example shows how to set a Scan instance to return the rows beginning 
with "row".
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+
+Table table = ...      // instantiate a Table instance
+
+Scan scan = new Scan();
+scan.addColumn(CF, ATTR);
+scan.setRowPrefixFilter(Bytes.toBytes("row"));
+ResultScanner rs = table.getScanner(scan);
+try {
+  for (Result r = rs.next(); r != null; r = rs.next()) {
+  // process result...
+} finally {
+  rs.close();  // always close the ResultScanner!
+----
+
+Note that generally the easiest way to specify a specific stop point for a 
scan is by using the 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter]
          class. 
+
+=== Delete
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete]
          removes a row from a table.
+Deletes are executed via 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[
+            HTable.delete]. 
+
+HBase does not modify data in place, and so deletes are handled by creating 
new markers called _tombstones_.
+These tombstones, along with the dead values, are cleaned up on major 
compactions. 
+
+See <<version.delete,version.delete>> for more information on deleting 
versions of columns, and see <<compaction,compaction>> for more information on 
compactions. 
+
+[[versions]]
+== Versions
+
+A _{row, column, version}_ tuple exactly specifies a [literal]+cell+ in HBase.
+It's possible to have an unbounded number of cells where the row and column 
are the same but the cell address differs only in its version dimension.
+
+While rows and column keys are expressed as bytes, the version is specified 
using a long integer.
+Typically this long contains time instances such as those returned by 
[code]+java.util.Date.getTime()+ or [code]+System.currentTimeMillis()+, that 
is: [quote]_the difference, measured in milliseconds, between the current time 
and midnight, January 1, 1970 UTC_.
+
+The HBase version dimension is stored in decreasing order, so that when 
reading from a store file, the most recent values are found first.
+
+There is a lot of confusion over the semantics of [literal]+cell+ versions, in 
HBase.
+In particular:
+
+* If multiple writes to a cell have the same version, only the last written is 
fetchable.
+* It is OK to write cells in a non-increasing version order.
+
+Below we describe how the version dimension in HBase currently works.
+See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for 
discussion of HBase versions. 
link:http://outerthought.org/blog/417-ot.html[Bending time in HBase]            
makes for a good read on the version, or time, dimension in HBase.
+It has more detail on versioning than is provided here.
+As of this writing, the limiitation _Overwriting values at existing 
timestamps_ mentioned in the article no longer holds in HBase.
+This section is basically a synopsis of this article by Bruno Dumon.
+
+[[specify.number.of.versions]]
+=== Specifying the Number of Versions to Store
+
+The maximum number of versions to store for a given column is part of the 
column schema and is specified at table creation, or via an +alter+ command, 
via [code]+HColumnDescriptor.DEFAULT_VERSIONS+.
+Prior to HBase 0.96, the default number of versions kept was [literal]+3+, but 
in 0.96 and newer has been changed to [literal]+1+.
+
+.Modify the Maximum Number of Versions for a Column
+====
+This example uses HBase Shell to keep a maximum of 5 versions of column 
[code]+f1+.
+You could also use 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+
+----
+hbase> alter ât1â², NAME => âf1â², VERSIONS => 5
+----
+====
+
+.Modify the Minimum Number of Versions for a Column
+====
+You can also specify the minimum number of versions to store.
+By default, this is set to 0, which means the feature is disabled.
+The following example sets the minimum number of versions on field [code]+f1+ 
to [literal]+2+, via HBase Shell.
+You could also use 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+
+----
+hbase> alter ât1â², NAME => âf1â², MIN_VERSIONS => 2
+----
+====
+
+Starting with HBase 0.98.2, you can specify a global default for the maximum 
number of versions kept for all newly-created columns, by setting 
+hbase.column.max.version+ in [path]_hbase-site.xml_.
+See <<hbase.column.max.version,hbase.column.max.version>>.
+
+[[versions.ops]]
+=== Versions and HBase Operations
+
+In this section we look at the behavior of the version dimension for each of 
the core HBase operations.
+
+==== Get/Scan
+
+Gets are implemented on top of Scans.
+The below discussion of 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get]
            applies equally to 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
+
+By default, i.e.
+if you specify no explicit version, when doing a [literal]+get+, the cell 
whose version has the largest value is returned (which may or may not be the 
latest one written, see later). The default behavior can be modified in the 
following ways:
+
+* to return more than one version, see 
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()[Get.setMaxVersions()]
+* to return versions other than the latest, see link:???[Get.setTimeRange()]
++
+To retrieve the latest version that is less than or equal to a given value, 
thus giving the 'latest' state of the record at a certain point in time, just 
use a range from 0 to the desired version and set the max versions to 1.
+
+
+==== Default Get Example
+
+The following Get will only retrieve the current version of the row
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(Bytes.toBytes("row1"));
+Result r = table.get(get);
+byte[] b = r.getValue(CF, ATTR);  // returns current version of value
+----
+
+==== Versioned Get Example
+
+The following Get will return the last 3 versions of the row.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(Bytes.toBytes("row1"));
+get.setMaxVersions(3);  // will return last 3 versions of row
+Result r = table.get(get);
+byte[] b = r.getValue(CF, ATTR);  // returns current version of value
+List<KeyValue> kv = r.getColumn(CF, ATTR);  // returns all versions of this 
column
+----
+
+==== Put
+
+Doing a put always creates a new version of a [literal]+cell+, at a certain 
timestamp.
+By default the system uses the server's [literal]+currentTimeMillis+, but you 
can specify the version (= the long integer) yourself, on a per-column level.
+This means you could assign a time in the past or the future, or use the long 
value for non-time purposes.
+
+To overwrite an existing value, do a put at exactly the same row, column, and 
version as that of the cell you would overshadow.
+
+===== Implicit Version Example
+
+The following Put will be implicitly versioned by HBase with the current time.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Put put = new Put(Bytes.toBytes(row));
+put.add(CF, ATTR, Bytes.toBytes( data));
+table.put(put);
+----
+
+===== Explicit Version Example
+
+The following Put has the version timestamp explicitly set.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Put put = new Put( Bytes.toBytes(row));
+long explicitTimeInMs = 555;  // just an example
+put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
+table.put(put);
+----
+
+Caution: the version timestamp is internally by HBase for things like 
time-to-live calculations.
+It's usually best to avoid setting this timestamp yourself.
+Prefer using a separate timestamp attribute of the row, or have the timestamp 
a part of the rowkey, or both. 
+
+[[version.delete]]
+==== Delete
+
+There are three different types of internal delete markers.
+See Lars Hofhansl's blog for discussion of his attempt adding another, 
link:http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html[Scanning
+              in HBase: Prefix Delete Marker]. 
+
+* Delete: for a specific version of a column.
+* Delete column: for all versions of a column.
+* Delete family: for all columns of a particular ColumnFamily
+
+When deleting an entire row, HBase will internally create a tombstone for each 
ColumnFamily (i.e., not each individual column). 
+
+Deletes work by creating _tombstone_ markers.
+For example, let's suppose we want to delete a row.
+For this you can specify a version, or else by default the 
[literal]+currentTimeMillis+ is used.
+What this means is [quote]_delete all
+              cells where the version is less than or equal to this version_.
+HBase never modifies data in place, so for example a delete will not 
immediately delete (or mark as deleted) the entries in the storage file that 
correspond to the delete condition.
+Rather, a so-called _tombstone_ is written, which will mask the deleted values.
+When HBase does a major compaction, the tombstones are processed to actually 
remove the dead values, together with the tombstones themselves.
+If the version you specified when deleting a row is larger than the version of 
any value in the row, then you can consider the complete row to be deleted.
+
+For an informative discussion on how deletes and versioning interact, see the 
thread 
link:http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421[Put w/
+              timestamp -> Deleteall -> Put w/ timestamp fails] up on the user 
mailing list.
+
+Also see <<keyvalue,keyvalue>> for more information on the internal KeyValue 
format. 
+
+Delete markers are purged during the next major compaction of the store, 
unless the +KEEP_DELETED_CELLS+ option is set in the column family.
+To keep the deletes for a configurable amount of time, you can set the delete 
TTL via the +hbase.hstore.time.to.purge.deletes+ property in 
[path]_hbase-site.xml_.
+If +hbase.hstore.time.to.purge.deletes+ is not set, or set to 0, all delete 
markers, including those with timestamps in the future, are purged during the 
next major compaction.
+Otherwise, a delete marker with a timestamp in the future is kept until the 
major compaction which occurs after the time represented by the marker's 
timestamp plus the value of +hbase.hstore.time.to.purge.deletes+, in 
milliseconds. 
+
+NOTE: This behavior represents a fix for an unexpected change that was 
introduced in HBase 0.94, and was fixed in 
link:https://issues.apache.org/jira/browse/HBASE-10118[HBASE-10118].
+The change has been backported to HBase 0.94 and newer branches.
+
+=== Current Limitations
+
+==== Deletes mask Puts
+
+Deletes mask puts, even puts that happened after the delete was entered.
+See link:https://issues.apache.org/jira/browse/HBASE-2256[HBASE-2256].
+Remember that a delete writes a tombstone, which only disappears after then 
next major compaction has run.
+Suppose you do a delete of everything <= T.
+After this you do a new put with a timestamp <= T.
+This put, even if it happened after the delete, will be masked by the delete 
tombstone.
+Performing the put will not fail, but when you do a get you will notice the 
put did have no effect.
+It will start working again after the major compaction has run.
+These issues should not be a problem if you use always-increasing versions for 
new puts to a row.
+But they can occur even if you do not care about time: just do delete and put 
immediately after each other, and there is some chance they happen within the 
same millisecond.
+
+[[major.compactions.change.query.results]]
+==== Major compactions change query results
+
+[quote]_...create three cell versions at t1, t2 and t3, with a maximum-versions
+              setting of 2. So when getting all versions, only the values at 
t2 and t3 will be
+              returned. But if you delete the version at t2 or t3, the one at 
t1 will appear again.
+              Obviously, once a major compaction has run, such behavior will 
not be the case
+              anymore..._ (See _Garbage Collection_ in 
link:http://outerthought.org/blog/417-ot.html[Bending time in
+            HBase].)
+
+[[dm.sort]]
+== Sort Order
+
+All data model operations HBase return data in sorted order.
+First by row, then by ColumnFamily, followed by column qualifier, and finally 
timestamp (sorted in reverse, so newest records are returned first). 
+
+[[dm.column.metadata]]
+== Column Metadata
+
+There is no store of column metadata outside of the internal KeyValue 
instances for a ColumnFamily.
+Thus, while HBase can support not only a wide number of columns per row, but a 
heterogenous set of columns between rows as well, it is your responsibility to 
keep track of the column names. 
+
+The only way to get a complete set of columns that exist for a ColumnFamily is 
to process all the rows.
+For more information about how HBase stores data internally, see 
<<keyvalue,keyvalue>>. 
+
+== Joins
+
+Whether HBase supports joins is a common question on the dist-list, and there 
is a simple answer:  it doesn't, at not least in the way that RDBMS' support 
them (e.g., with equi-joins or outer-joins in SQL).  As has been illustrated in 
this chapter, the read data model operations in HBase are Get and Scan. 
+
+However, that doesn't mean that equivalent join functionality can't be 
supported in your application, but you have to do it yourself.
+The two primary strategies are either denormalizing the data upon writing to 
HBase, or to have lookup tables and do the join between HBase tables in your 
application or MapReduce code (and as RDBMS' demonstrate, there are several 
strategies for this depending on the size of the tables, e.g., nested loops vs.
+hash-joins).  So which is the best approach?  It depends on what you are 
trying to do, and as such there isn't a single answer that works for every use 
case. 
+
+== ACID
+
+See link:http://hbase.apache.org/acid-semantics.html[ACID Semantics].
+Lars Hofhansl has also written a note on 
link:http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html[ACID in HBase].
+
+ifdef::backend-docbook[]
+[index]
+== Index
+// Generated automatically by the DocBook toolchain.
+endif::backend-docbook[]
[29/33] hbase git commit: Tying up loose ends

Reply via email to