http://git-wip-us.apache.org/repos/asf/hbase/blob/abaea39e/src/main/asciidoc/_chapters/datamodel.adoc
----------------------------------------------------------------------
diff --git a/src/main/asciidoc/_chapters/datamodel.adoc
b/src/main/asciidoc/_chapters/datamodel.adoc
new file mode 100644
index 0000000..8bca69b
--- /dev/null
+++ b/src/main/asciidoc/_chapters/datamodel.adoc
@@ -0,0 +1,585 @@
+////
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+////
+
+[[datamodel]]
+= Data Model
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+In HBase, data is stored in tables, which have rows and columns.
+This is a terminology overlap with relational databases (RDBMSs), but this is
not a helpful analogy.
+Instead, it can be helpful to think of an HBase table as a multi-dimensional
map.
+
+.HBase Data Model TerminologyTable::
+ An HBase table consists of multiple rows.
+
+Row::
+ A row in HBase consists of a row key and one or more columns with values
associated with them.
+ Rows are sorted alphabetically by the row key as they are stored.
+ For this reason, the design of the row key is very important.
+ The goal is to store data in such a way that related rows are near each
other.
+ A common row key pattern is a website domain.
+ If your row keys are domains, you should probably store them in reverse
(org.apache.www, org.apache.mail, org.apache.jira). This way, all of the Apache
domains are near each other in the table, rather than being spread out based on
the first letter of the subdomain.
+
+Column::
+ A column in HBase consists of a column family and a column qualifier, which
are delimited by a [literal]+:+ (colon) character.
+
+Column Family::
+ Column families physically colocate a set of columns and their values, often
for performance reasons.
+ Each column family has a set of storage properties, such as whether its
values should be cached in memory, how its data is compressed or its row keys
are encoded, and others.
+ Each row in a table has the same column families, though a given row might
not store anything in a given column family.
+
+Column Qualifier::
+ A column qualifier is added to a column family to provide the index for a
given piece of data.
+ Given a column family [literal]+content+, a column qualifier might be
[literal]+content:html+, and another might be [literal]+content:pdf+.
+ Though column families are fixed at table creation, column qualifiers are
mutable and may differ greatly between rows.
+
+Cell::
+ A cell is a combination of row, column family, and column qualifier, and
contains a value and a timestamp, which represents the value's version.
+
+Timestamp::
+ A timestamp is written alongside each value, and is the identifier for a
given version of a value.
+ By default, the timestamp represents the time on the RegionServer when the
data was written, but you can specify a different timestamp value when you put
data into the cell.
+
+[[conceptual.view]]
+== Conceptual View
+
+You can read a very understandable explanation of the HBase data model in the
blog post
link:http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable[Understanding
+ HBase and BigTable] by Jim R.
+Wilson.
+Another good explanation is available in the PDF
link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction
+ to Basic Schema Design] by Amandeep Khurana.
+It may help to read different perspectives to get a solid understanding of
HBase schema design.
+The linked articles cover the same ground as the information in this section.
+
+The following example is a slightly modified form of the one on page 2 of the
link:http://research.google.com/archive/bigtable.html[BigTable] paper.
+There is a table called [var]+webtable+ that contains two rows
([literal]+com.cnn.www+ and [literal]+com.example.www+), three column
families named [var]+contents+, [var]+anchor+, and [var]+people+.
+In this example, for the first row ([literal]+com.cnn.www+), [var]+anchor+
contains two columns ([var]+anchor:cssnsi.com+, [var]+anchor:my.look.ca+) and
[var]+contents+ contains one column ([var]+contents:html+). This example
contains 5 versions of the row with the row key [literal]+com.cnn.www+, and one
version of the row with the row key [literal]+com.example.www+.
+The [var]+contents:html+ column qualifier contains the entire HTML of a given
website.
+Qualifiers of the [var]+anchor+ column family each contain the external site
which links to the site represented by the row, along with the text it used in
the anchor of its link.
+The [var]+people+ column family represents people associated with the site.
+
+.Column Names
+[NOTE]
+====
+By convention, a column name is made of its column family prefix and a
_qualifier_.
+For example, the column _contents:html_ is made up of the column family
[var]+contents+ and the [var]+html+ qualifier.
+The colon character ([literal]+:+) delimits the column family from the column
family _qualifier_.
+====
+
+.Table [var]+webtable+
+[cols="1,1,1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| ColumnFamily contents
+| ColumnFamily anchor
+| ColumnFamily people
+| anchor:cnnsi.com = "CNN"
+
+| anchor:my.look.ca = "CNN.com"
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+|===
+
+Cells in this table that appear to be empty do not take space, or in fact
exist, in HBase.
+This is what makes HBase "sparse." A tabular view is not the only possible way
to look at data in HBase, or even the most accurate.
+The following represents the same information as a multi-dimensional map.
+This is only a mock-up for illustrative purposes and may not be strictly
accurate.
+
+[source]
+----
+
+{
+ "com.cnn.www": {
+ contents: {
+ t6: contents:html: "<html>..."
+ t5: contents:html: "<html>..."
+ t3: contents:html: "<html>..."
+ }
+ anchor: {
+ t9: anchor:cnnsi.com = "CNN"
+ t8: anchor:my.look.ca = "CNN.com"
+ }
+ people: {}
+ }
+ "com.example.www": {
+ contents: {
+ t5: contents:html: "<html>..."
+ }
+ anchor: {}
+ people: {
+ t5: people:author: "John Doe"
+ }
+ }
+}
+----
+
+[[physical.view]]
+== Physical View
+
+Although at a conceptual level tables may be viewed as a sparse set of rows,
they are physically stored by column family.
+A new column qualifier (column_family:column_qualifier) can be added to an
existing column family at any time.
+
+.ColumnFamily [var]+anchor+
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| Column Family anchor
+| anchor:cnnsi.com = "CNN"
+
+| anchor:my.look.ca = "CNN.com"
+|===
+
+.ColumnFamily [var]+contents+
+[cols="1,1,1", frame="all", options="header"]
+|===
+| Row Key
+| Time Stamp
+| ColumnFamily "contents:"
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+
+| contents:html = "<html>..."
+|===
+
+The empty cells shown in the conceptual view are not stored at all.
+Thus a request for the value of the [var]+contents:html+ column at time stamp
[literal]+t8+ would return no value.
+Similarly, a request for an [var]+anchor:my.look.ca+ value at time stamp
[literal]+t9+ would return no value.
+However, if no timestamp is supplied, the most recent value for a particular
column would be returned.
+Given multiple versions, the most recent is also the first one found, since
timestamps are stored in descending order.
+Thus a request for the values of all columns in the row [var]+com.cnn.www+ if
no timestamp is specified would be: the value of [var]+contents:html+ from
timestamp [literal]+t6+, the value of [var]+anchor:cnnsi.com+ from timestamp
[literal]+t9+, the value of [var]+anchor:my.look.ca+ from timestamp
[literal]+t8+.
+
+For more information about the internals of how Apache HBase stores data, see
<<regions.arch,regions.arch>>.
+
+== Namespace
+
+A namespace is a logical grouping of tables analogous to a database in
relation database systems.
+This abstraction lays the groundwork for upcoming multi-tenancy related
features:
+
+* Quota Management (HBASE-8410) - Restrict the amount of resources (ie
regions, tables) a namespace can consume.
+* Namespace Security Administration (HBASE-9206) - provide another level of
security administration for tenants.
+* Region server groups (HBASE-6721) - A namespace/table can be pinned onto a
subset of regionservers thus guaranteeing a course level of isolation.
+
+[[namespace_creation]]
+=== Namespace management
+
+A namespace can be created, removed or altered.
+Namespace membership is determined during table creation by specifying a
fully-qualified table name of the form:
+
+[source,xml]
+----
+<table namespace>:<table qualifier>
+----
+
+.Examples
+====
+[source,bourne]
+----
+
+#Create a namespace
+create_namespace 'my_ns'
+----
+
+[source,bourne]
+----
+
+#create my_table in my_ns namespace
+create 'my_ns:my_table', 'fam'
+----
+
+[source,bourne]
+----
+
+#drop namespace
+drop_namespace 'my_ns'
+----
+
+[source,bourne]
+----
+
+#alter namespace
+alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
+----
+====
+
+[[namespace_special]]
+=== Predefined namespaces
+
+There are two predefined special namespaces:
+
+* hbase - system namespace, used to contain hbase internal tables
+* default - tables with no explicit specified namespace will automatically
fall into this namespace.
+
+.Examples
+====
+[source,bourne]
+----
+
+#namespace=foo and table qualifier=bar
+create 'foo:bar', 'fam'
+
+#namespace=default and table qualifier=bar
+create 'bar', 'fam'
+----
+====
+
+== Table
+
+Tables are declared up front at schema definition time.
+
+== Row
+
+Row keys are uninterrpreted bytes.
+Rows are lexicographically sorted with the lowest order appearing first in a
table.
+The empty byte array is used to denote both the start and end of a tables'
namespace.
+
+[[columnfamily]]
+== Column Family
+
+Columns in Apache HBase are grouped into _column families_.
+All column members of a column family have the same prefix.
+For example, the columns _courses:history_ and _courses:math_ are both members
of the _courses_ column family.
+The colon character ([literal]+:+) delimits the column family from the
+column family qualifier.
+The column family prefix must be composed of _printable_ characters.
+The qualifying tail, the column family _qualifier_, can be made of any
arbitrary bytes.
+Column families must be declared up front at schema definition time whereas
columns do not need to be defined at schema time but can be conjured on the fly
while the table is up an running.
+
+Physically, all column family members are stored together on the filesystem.
+Because tunings and storage specifications are done at the column family
level, it is advised that all column family members have the same general
access pattern and size characteristics.
+
+== Cells
+
+A _{row, column, version}_ tuple exactly specifies a [literal]+cell+ in HBase.
+Cell content is uninterrpreted bytes
+
+== Data Model Operations
+
+The four primary data model operations are Get, Put, Scan, and Delete.
+Operations are applied via
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]
instances.
+
+=== Get
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get]
returns attributes for a specified row.
+Gets are executed via
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[
+ Table.get].
+
+=== Put
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put]
either adds new rows to a table (if the key is new) or can update
existing rows (if the key already exists). Puts are executed via
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[
+ Table.put] (writeBuffer) or
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,
java.lang.Object[])[
+ Table.batch] (non-writeBuffer).
+
+[[scan]]
+=== Scans
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan]
allow iteration over multiple rows for specified attributes.
+
+The following is an example of a Scan on a Table instance.
+Assume that a table is populated with rows with keys "row1", "row2", "row3",
and then another set of rows with the keys "abc1", "abc2", and "abc3". The
following example shows how to set a Scan instance to return the rows beginning
with "row".
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+
+Table table = ... // instantiate a Table instance
+
+Scan scan = new Scan();
+scan.addColumn(CF, ATTR);
+scan.setRowPrefixFilter(Bytes.toBytes("row"));
+ResultScanner rs = table.getScanner(scan);
+try {
+ for (Result r = rs.next(); r != null; r = rs.next()) {
+ // process result...
+} finally {
+ rs.close(); // always close the ResultScanner!
+----
+
+Note that generally the easiest way to specify a specific stop point for a
scan is by using the
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter]
class.
+
+=== Delete
+
+link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete]
removes a row from a table.
+Deletes are executed via
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[
+ HTable.delete].
+
+HBase does not modify data in place, and so deletes are handled by creating
new markers called _tombstones_.
+These tombstones, along with the dead values, are cleaned up on major
compactions.
+
+See <<version.delete,version.delete>> for more information on deleting
versions of columns, and see <<compaction,compaction>> for more information on
compactions.
+
+[[versions]]
+== Versions
+
+A _{row, column, version}_ tuple exactly specifies a [literal]+cell+ in HBase.
+It's possible to have an unbounded number of cells where the row and column
are the same but the cell address differs only in its version dimension.
+
+While rows and column keys are expressed as bytes, the version is specified
using a long integer.
+Typically this long contains time instances such as those returned by
[code]+java.util.Date.getTime()+ or [code]+System.currentTimeMillis()+, that
is: [quote]_the difference, measured in milliseconds, between the current time
and midnight, January 1, 1970 UTC_.
+
+The HBase version dimension is stored in decreasing order, so that when
reading from a store file, the most recent values are found first.
+
+There is a lot of confusion over the semantics of [literal]+cell+ versions, in
HBase.
+In particular:
+
+* If multiple writes to a cell have the same version, only the last written is
fetchable.
+* It is OK to write cells in a non-increasing version order.
+
+Below we describe how the version dimension in HBase currently works.
+See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for
discussion of HBase versions.
link:http://outerthought.org/blog/417-ot.html[Bending time in HBase]
makes for a good read on the version, or time, dimension in HBase.
+It has more detail on versioning than is provided here.
+As of this writing, the limiitation _Overwriting values at existing
timestamps_ mentioned in the article no longer holds in HBase.
+This section is basically a synopsis of this article by Bruno Dumon.
+
+[[specify.number.of.versions]]
+=== Specifying the Number of Versions to Store
+
+The maximum number of versions to store for a given column is part of the
column schema and is specified at table creation, or via an +alter+ command,
via [code]+HColumnDescriptor.DEFAULT_VERSIONS+.
+Prior to HBase 0.96, the default number of versions kept was [literal]+3+, but
in 0.96 and newer has been changed to [literal]+1+.
+
+.Modify the Maximum Number of Versions for a Column
+====
+This example uses HBase Shell to keep a maximum of 5 versions of column
[code]+f1+.
+You could also use
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+
+----
+hbase> alter ât1â², NAME => âf1â², VERSIONS => 5
+----
+====
+
+.Modify the Minimum Number of Versions for a Column
+====
+You can also specify the minimum number of versions to store.
+By default, this is set to 0, which means the feature is disabled.
+The following example sets the minimum number of versions on field [code]+f1+
to [literal]+2+, via HBase Shell.
+You could also use
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
+
+----
+hbase> alter ât1â², NAME => âf1â², MIN_VERSIONS => 2
+----
+====
+
+Starting with HBase 0.98.2, you can specify a global default for the maximum
number of versions kept for all newly-created columns, by setting
+hbase.column.max.version+ in [path]_hbase-site.xml_.
+See <<hbase.column.max.version,hbase.column.max.version>>.
+
+[[versions.ops]]
+=== Versions and HBase Operations
+
+In this section we look at the behavior of the version dimension for each of
the core HBase operations.
+
+==== Get/Scan
+
+Gets are implemented on top of Scans.
+The below discussion of
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get]
applies equally to
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
+
+By default, i.e.
+if you specify no explicit version, when doing a [literal]+get+, the cell
whose version has the largest value is returned (which may or may not be the
latest one written, see later). The default behavior can be modified in the
following ways:
+
+* to return more than one version, see
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()[Get.setMaxVersions()]
+* to return versions other than the latest, see link:???[Get.setTimeRange()]
++
+To retrieve the latest version that is less than or equal to a given value,
thus giving the 'latest' state of the record at a certain point in time, just
use a range from 0 to the desired version and set the max versions to 1.
+
+
+==== Default Get Example
+
+The following Get will only retrieve the current version of the row
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(Bytes.toBytes("row1"));
+Result r = table.get(get);
+byte[] b = r.getValue(CF, ATTR); // returns current version of value
+----
+
+==== Versioned Get Example
+
+The following Get will return the last 3 versions of the row.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(Bytes.toBytes("row1"));
+get.setMaxVersions(3); // will return last 3 versions of row
+Result r = table.get(get);
+byte[] b = r.getValue(CF, ATTR); // returns current version of value
+List<KeyValue> kv = r.getColumn(CF, ATTR); // returns all versions of this
column
+----
+
+==== Put
+
+Doing a put always creates a new version of a [literal]+cell+, at a certain
timestamp.
+By default the system uses the server's [literal]+currentTimeMillis+, but you
can specify the version (= the long integer) yourself, on a per-column level.
+This means you could assign a time in the past or the future, or use the long
value for non-time purposes.
+
+To overwrite an existing value, do a put at exactly the same row, column, and
version as that of the cell you would overshadow.
+
+===== Implicit Version Example
+
+The following Put will be implicitly versioned by HBase with the current time.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Put put = new Put(Bytes.toBytes(row));
+put.add(CF, ATTR, Bytes.toBytes( data));
+table.put(put);
+----
+
+===== Explicit Version Example
+
+The following Put has the version timestamp explicitly set.
+
+[source,java]
+----
+
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Put put = new Put( Bytes.toBytes(row));
+long explicitTimeInMs = 555; // just an example
+put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
+table.put(put);
+----
+
+Caution: the version timestamp is internally by HBase for things like
time-to-live calculations.
+It's usually best to avoid setting this timestamp yourself.
+Prefer using a separate timestamp attribute of the row, or have the timestamp
a part of the rowkey, or both.
+
+[[version.delete]]
+==== Delete
+
+There are three different types of internal delete markers.
+See Lars Hofhansl's blog for discussion of his attempt adding another,
link:http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html[Scanning
+ in HBase: Prefix Delete Marker].
+
+* Delete: for a specific version of a column.
+* Delete column: for all versions of a column.
+* Delete family: for all columns of a particular ColumnFamily
+
+When deleting an entire row, HBase will internally create a tombstone for each
ColumnFamily (i.e., not each individual column).
+
+Deletes work by creating _tombstone_ markers.
+For example, let's suppose we want to delete a row.
+For this you can specify a version, or else by default the
[literal]+currentTimeMillis+ is used.
+What this means is [quote]_delete all
+ cells where the version is less than or equal to this version_.
+HBase never modifies data in place, so for example a delete will not
immediately delete (or mark as deleted) the entries in the storage file that
correspond to the delete condition.
+Rather, a so-called _tombstone_ is written, which will mask the deleted values.
+When HBase does a major compaction, the tombstones are processed to actually
remove the dead values, together with the tombstones themselves.
+If the version you specified when deleting a row is larger than the version of
any value in the row, then you can consider the complete row to be deleted.
+
+For an informative discussion on how deletes and versioning interact, see the
thread
link:http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421[Put w/
+ timestamp -> Deleteall -> Put w/ timestamp fails] up on the user
mailing list.
+
+Also see <<keyvalue,keyvalue>> for more information on the internal KeyValue
format.
+
+Delete markers are purged during the next major compaction of the store,
unless the +KEEP_DELETED_CELLS+ option is set in the column family.
+To keep the deletes for a configurable amount of time, you can set the delete
TTL via the +hbase.hstore.time.to.purge.deletes+ property in
[path]_hbase-site.xml_.
+If +hbase.hstore.time.to.purge.deletes+ is not set, or set to 0, all delete
markers, including those with timestamps in the future, are purged during the
next major compaction.
+Otherwise, a delete marker with a timestamp in the future is kept until the
major compaction which occurs after the time represented by the marker's
timestamp plus the value of +hbase.hstore.time.to.purge.deletes+, in
milliseconds.
+
+NOTE: This behavior represents a fix for an unexpected change that was
introduced in HBase 0.94, and was fixed in
link:https://issues.apache.org/jira/browse/HBASE-10118[HBASE-10118].
+The change has been backported to HBase 0.94 and newer branches.
+
+=== Current Limitations
+
+==== Deletes mask Puts
+
+Deletes mask puts, even puts that happened after the delete was entered.
+See link:https://issues.apache.org/jira/browse/HBASE-2256[HBASE-2256].
+Remember that a delete writes a tombstone, which only disappears after then
next major compaction has run.
+Suppose you do a delete of everything <= T.
+After this you do a new put with a timestamp <= T.
+This put, even if it happened after the delete, will be masked by the delete
tombstone.
+Performing the put will not fail, but when you do a get you will notice the
put did have no effect.
+It will start working again after the major compaction has run.
+These issues should not be a problem if you use always-increasing versions for
new puts to a row.
+But they can occur even if you do not care about time: just do delete and put
immediately after each other, and there is some chance they happen within the
same millisecond.
+
+[[major.compactions.change.query.results]]
+==== Major compactions change query results
+
+[quote]_...create three cell versions at t1, t2 and t3, with a maximum-versions
+ setting of 2. So when getting all versions, only the values at
t2 and t3 will be
+ returned. But if you delete the version at t2 or t3, the one at
t1 will appear again.
+ Obviously, once a major compaction has run, such behavior will
not be the case
+ anymore..._ (See _Garbage Collection_ in
link:http://outerthought.org/blog/417-ot.html[Bending time in
+ HBase].)
+
+[[dm.sort]]
+== Sort Order
+
+All data model operations HBase return data in sorted order.
+First by row, then by ColumnFamily, followed by column qualifier, and finally
timestamp (sorted in reverse, so newest records are returned first).
+
+[[dm.column.metadata]]
+== Column Metadata
+
+There is no store of column metadata outside of the internal KeyValue
instances for a ColumnFamily.
+Thus, while HBase can support not only a wide number of columns per row, but a
heterogenous set of columns between rows as well, it is your responsibility to
keep track of the column names.
+
+The only way to get a complete set of columns that exist for a ColumnFamily is
to process all the rows.
+For more information about how HBase stores data internally, see
<<keyvalue,keyvalue>>.
+
+== Joins
+
+Whether HBase supports joins is a common question on the dist-list, and there
is a simple answer: it doesn't, at not least in the way that RDBMS' support
them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated in
this chapter, the read data model operations in HBase are Get and Scan.
+
+However, that doesn't mean that equivalent join functionality can't be
supported in your application, but you have to do it yourself.
+The two primary strategies are either denormalizing the data upon writing to
HBase, or to have lookup tables and do the join between HBase tables in your
application or MapReduce code (and as RDBMS' demonstrate, there are several
strategies for this depending on the size of the tables, e.g., nested loops vs.
+hash-joins). So which is the best approach? It depends on what you are
trying to do, and as such there isn't a single answer that works for every use
case.
+
+== ACID
+
+See link:http://hbase.apache.org/acid-semantics.html[ACID Semantics].
+Lars Hofhansl has also written a note on
link:http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html[ACID in HBase].
+
+ifdef::backend-docbook[]
+[index]
+== Index
+// Generated automatically by the DocBook toolchain.
+endif::backend-docbook[]