Thank you lars.

My question is answered.

regards,
Lin

On Mon, Aug 6, 2012 at 12:30 PM, lars hofhansl <lhofha...@yahoo.com> wrote:

> A key in HBase looks like this: (rowkey, column family, column, timestamp)
>
> HBase will do two things for you:
> 1. All keys that have the same row key are stored in the same region
> 2. All keys are sorted
>
>
> (The column family is special in the each column family has it's one store
> file, but the logical sort order still holds).
>
> Think of it this way.
> Say you have two column families and two regions (A and B). You find the
> following ordering:
> Storefile(s) for column family 1 in Region A:
> (row1, column family1, column1, ts)->value
> (row1, column family1, column2, ts)->value
> (row2, column family1, column1, ts)->value
> (row2, column family1, column2, ts)->value
>
> Storefile(s) for column family 1 in Region B:
> (row3, column family1, column1, ts)->value
> (row3, column family1, column2, ts)->value
>
> Storefile(s) for column family 2: in Region A:
> (row1, column family2, column1, ts)->value
> (row1, column family2, column2, ts)->value
> (row2, column family2, column1, ts)->value
> (row2, column family2, column2, ts)->value
>
> Storefile(s) for column family 2 in Region B:
> (row3, column family2, column1, ts)->value
> (row3, column family2, column2, ts)->value
>
> So region A has rows row1 and row2, region B has row3.
> A region is shard of a table based on the row key and just
>
> #1 above means that HBase will never place key value for "row1" in
> different regions.
> #2 means you very efficiently locate specific keys, as they are always
> stored sorted.
>
> You should work through the topic in the HBase book:
> http://hbase.apache.org/book/datamodel.html.
>
> -- Lars
>
>
> ----- Original Message -----
> From: Lin Ma <lin...@gmail.com>
> To: user@hbase.apache.org; lars hofhansl <lhofha...@yahoo.com>
> Cc:
> Sent: Sunday, August 5, 2012 8:44 PM
> Subject: Re: column based or row based storage for HBase?
>
> Hi Lars,
>
> What do you mean a set of "keys that have the same row key" and
> "colocated"? It will be appreciated if you could show an example or provide
> more information.
>
> regards,
> Lin
>
> On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lhofha...@yahoo.com> wrote:
>
> > Hi Lin,
> >
> > HBase stores key -> value mappings sorted by key. So it is a key value
> > store.
> >
> > The key has internal structure, for example it starts with a row key.
> > HBase makes extra guarantees about a set of keys that have the same row
> > key (keeps them colocated, allows atomic operations, etc).
> >
> > I tried to write this up a while back:
> > http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Lin Ma <lin...@gmail.com>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Sunday, August 5, 2012 6:04 AM
> > Subject: column based or row based storage for HBase?
> >
> > Hi guys,
> >
> > I am wondering whether HBase is using column based storage or row based
> > storage?
> >
> >    - I read some technical documents and mentioned advantages of HBase is
> >    using column based storage to store similar data together to foster
> >    compression. So it means same columns of different rows are stored
> > together;
> >    - But I also learned HBase is a sorted key-value map in underlying
> >    HFile. It uses key to address all related columns for that key (row),
> > so it
> >    seems to be a row based storage?
> >
> > It is appreciated if anyone could clarify my confusions. Any related
> > documents or code for more details are welcome.
> >
> > thanks in advance,
> >
> > Lin
> >
> >
>
>

Reply via email to