Thank you lars. My question is answered.
regards, Lin On Mon, Aug 6, 2012 at 12:30 PM, lars hofhansl <lhofha...@yahoo.com> wrote: > A key in HBase looks like this: (rowkey, column family, column, timestamp) > > HBase will do two things for you: > 1. All keys that have the same row key are stored in the same region > 2. All keys are sorted > > > (The column family is special in the each column family has it's one store > file, but the logical sort order still holds). > > Think of it this way. > Say you have two column families and two regions (A and B). You find the > following ordering: > Storefile(s) for column family 1 in Region A: > (row1, column family1, column1, ts)->value > (row1, column family1, column2, ts)->value > (row2, column family1, column1, ts)->value > (row2, column family1, column2, ts)->value > > Storefile(s) for column family 1 in Region B: > (row3, column family1, column1, ts)->value > (row3, column family1, column2, ts)->value > > Storefile(s) for column family 2: in Region A: > (row1, column family2, column1, ts)->value > (row1, column family2, column2, ts)->value > (row2, column family2, column1, ts)->value > (row2, column family2, column2, ts)->value > > Storefile(s) for column family 2 in Region B: > (row3, column family2, column1, ts)->value > (row3, column family2, column2, ts)->value > > So region A has rows row1 and row2, region B has row3. > A region is shard of a table based on the row key and just > > #1 above means that HBase will never place key value for "row1" in > different regions. > #2 means you very efficiently locate specific keys, as they are always > stored sorted. > > You should work through the topic in the HBase book: > http://hbase.apache.org/book/datamodel.html. > > -- Lars > > > ----- Original Message ----- > From: Lin Ma <lin...@gmail.com> > To: user@hbase.apache.org; lars hofhansl <lhofha...@yahoo.com> > Cc: > Sent: Sunday, August 5, 2012 8:44 PM > Subject: Re: column based or row based storage for HBase? > > Hi Lars, > > What do you mean a set of "keys that have the same row key" and > "colocated"? It will be appreciated if you could show an example or provide > more information. > > regards, > Lin > > On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl <lhofha...@yahoo.com> wrote: > > > Hi Lin, > > > > HBase stores key -> value mappings sorted by key. So it is a key value > > store. > > > > The key has internal structure, for example it starts with a row key. > > HBase makes extra guarantees about a set of keys that have the same row > > key (keeps them colocated, allows atomic operations, etc). > > > > I tried to write this up a while back: > > http://hadoop-hbase.blogspot.com/2011/12/introduction-to-hbase.html > > > > -- Lars > > > > > > > > ----- Original Message ----- > > From: Lin Ma <lin...@gmail.com> > > To: user@hbase.apache.org > > Cc: > > Sent: Sunday, August 5, 2012 6:04 AM > > Subject: column based or row based storage for HBase? > > > > Hi guys, > > > > I am wondering whether HBase is using column based storage or row based > > storage? > > > > - I read some technical documents and mentioned advantages of HBase is > > using column based storage to store similar data together to foster > > compression. So it means same columns of different rows are stored > > together; > > - But I also learned HBase is a sorted key-value map in underlying > > HFile. It uses key to address all related columns for that key (row), > > so it > > seems to be a row based storage? > > > > It is appreciated if anyone could clarify my confusions. Any related > > documents or code for more details are welcome. > > > > thanks in advance, > > > > Lin > > > > > >