On Sun, Aug 5, 2012 at 8:03 PM, Lin Ma <lin...@gmail.com> wrote: > Thank you for the informative reply, Mohit! > > Some more comments, > > 1. actually my confusion about column based storage is from the book > "HBase The Definitive Guide", chapter 1, section "the Dawn of Big Data", > which draw a picture showing HBase store the same column of all different > rows continuously physically in storage. Any comments? > > 2. I want to confirm my understanding is correct -- supposing I have only > one column family with 10 columns, the physical storage is row (with all > related columns) after row, other than store 1st column of all rows, then > store 2nd columns of all rows, etc? > > 3. It seems when we say column based storage, there are two meanings, (1) > column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS, > where the same column of different rows stored together, (2) and column > oriented architecture, e.g. how Hbase is designed, which is used to > describe the pattern to store sparse, large number of columns (with NULL > for free). Any comments? > > In simple terms, HBase is not a column Oriented store. All the data for a row is stored together but the store file is created only per column family.
> regards, > Lin > > > On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanch...@gmail.com>wrote: > >> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <lin...@gmail.com> wrote: >> >> > Hi guys, >> > >> > I am wondering whether HBase is using column based storage or row based >> > storage? >> > >> > - I read some technical documents and mentioned advantages of HBase >> is >> > using column based storage to store similar data together to foster >> > compression. So it means same columns of different rows are stored >> > together; >> >> >> Probably what you read was in context of Column Families. HBase has >> concept >> of column family similar to Google's bigtable. And the store files on disk >> is per column family. All columns of a given column family are in one >> store >> file and columns of different column family is a different file. >> >> >> > - But I also learned HBase is a sorted key-value map in underlying >> > HFile. It uses key to address all related columns for that key (row), >> > so it >> > seems to be a row based storage? >> > >> HBase stores entire row together along with columns represented by >> KeyValue. This is also called cell in HBase. >> >> >> > It is appreciated if anyone could clarify my confusions. Any related >> > documents or code for more details are welcome. >> > >> > thanks in advance, >> > >> > Lin >> > >> > >