Thank you Yong, So just clarify one thing, for your comments -- "column family stores continuously", does not mean data are stored *column after column physically * (e.g. store col1 of row 1, then col 1 of row 2, then col 1 of row 3, then col 2 of row 1, then col 2 of row 2, and finally col 2 of row 3), but means stored *row after row physically* (store col1 of row 1, then col 2 of row 1, then col1 of row 2, then col 2 of row 2, then col1 of row 3, then col 2 of row 3)?
regards, Lin On Mon, Aug 6, 2012 at 11:37 AM, yonghu <yongyong...@gmail.com> wrote: > In my understanding of column-oriented structure of hbase, the first > thing is the term column-oriented. The meaning is that the data which > belongs to the same column family stores continuously in the disk. For > each column-family, the data is stored as row store. If you want to > understand the internal mechnisam of HBase, you'd better take a look > at the content of HFile. > > regards! > > Yong > > On Mon, Aug 6, 2012 at 5:03 AM, Lin Ma <lin...@gmail.com> wrote: > > Thank you for the informative reply, Mohit! > > > > Some more comments, > > > > 1. actually my confusion about column based storage is from the book > "HBase > > The Definitive Guide", chapter 1, section "the Dawn of Big Data", which > > draw a picture showing HBase store the same column of all different rows > > continuously physically in storage. Any comments? > > > > 2. I want to confirm my understanding is correct -- supposing I have only > > one column family with 10 columns, the physical storage is row (with all > > related columns) after row, other than store 1st column of all rows, then > > store 2nd columns of all rows, etc? > > > > 3. It seems when we say column based storage, there are two meanings, (1) > > column oriented database => en.wikipedia.org/wiki/Column-oriented_DBMS, > > where the same column of different rows stored together, (2) and column > > oriented architecture, e.g. how Hbase is designed, which is used to > > describe the pattern to store sparse, large number of columns (with NULL > > for free). Any comments? > > > > regards, > > Lin > > > > On Mon, Aug 6, 2012 at 12:08 AM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > >> On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma <lin...@gmail.com> wrote: > >> > >> > Hi guys, > >> > > >> > I am wondering whether HBase is using column based storage or row > based > >> > storage? > >> > > >> > - I read some technical documents and mentioned advantages of > HBase is > >> > using column based storage to store similar data together to foster > >> > compression. So it means same columns of different rows are stored > >> > together; > >> > >> > >> Probably what you read was in context of Column Families. HBase has > concept > >> of column family similar to Google's bigtable. And the store files on > disk > >> is per column family. All columns of a given column family are in one > store > >> file and columns of different column family is a different file. > >> > >> > >> > - But I also learned HBase is a sorted key-value map in underlying > >> > HFile. It uses key to address all related columns for that key > (row), > >> > so it > >> > seems to be a row based storage? > >> > > >> HBase stores entire row together along with columns represented by > >> KeyValue. This is also called cell in HBase. > >> > >> > >> > It is appreciated if anyone could clarify my confusions. Any related > >> > documents or code for more details are welcome. > >> > > >> > thanks in advance, > >> > > >> > Lin > >> > > >> >