Thank you for the informative reply, Mohit! Some more comments,
1. actually my confusion about column based storage is from the book "HBase The Definitive Guide", chapter 1, section "the Dawn of Big Data", which draw a picture showing HBase store the same column of all different rows continuously physically in storage. Any comments? 2. I want to confirm my understanding is correct -- supposing I have only one column family with 10 columns, the physical storage is row (with all related columns) after row, other than store 1st column of all rows, then store 2nd columns of all rows, etc? 3. It seems when we say column based storage, there are two meanings, (1) column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS, where the same column of different rows stored together, (2) and column oriented architecture, e.g. how Hbase is designed, which is used to describe the pattern to store sparse, large number of columns (with NULL for free). Any comments? regards, Lin On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <lin...@gmail.com> wrote: > > > Hi guys, > > > > I am wondering whether HBase is using column based storage or row based > > storage? > > > > - I read some technical documents and mentioned advantages of HBase is > > using column based storage to store similar data together to foster > > compression. So it means same columns of different rows are stored > > together; > > > Probably what you read was in context of Column Families. HBase has concept > of column family similar to Google's bigtable. And the store files on disk > is per column family. All columns of a given column family are in one store > file and columns of different column family is a different file. > > > > - But I also learned HBase is a sorted key-value map in underlying > > HFile. It uses key to address all related columns for that key (row), > > so it > > seems to be a row based storage? > > > HBase stores entire row together along with columns represented by > KeyValue. This is also called cell in HBase. > > > > It is appreciated if anyone could clarify my confusions. Any related > > documents or code for more details are welcome. > > > > thanks in advance, > > > > Lin > > >