Disk Seeks and Column families

2012-01-20 Thread Praveen Sripati
Hi, 1) According to the this url (1), HBase performs well for two or three column families. Why is it so? 2) Dump of a HFile, looks like below. The contents of a row stay together like a regular row-oriented database. If the column family has 100 column family qualifiers and is dense then the dat

Re: Disk Seeks and Column families

2012-01-21 Thread Andrey Stepachev
2012/1/21 Praveen Sripati : > Hi, > > 1) According to the this url (1), HBase performs well for two or three > column families. Why is it so? Frist, each column family stored in separate location, so, as stated in '6.2.1. Cardinality of ColumnFamilies', such schema design can lead to many small pi

Re: Disk Seeks and Column families

2012-01-21 Thread Doug Meil
Also, for #2 Hbase supports large-scale aggregation through MapReduce. On 1/21/12 7:47 AM, "Andrey Stepachev" wrote: >2012/1/21 Praveen Sripati : >> Hi, >> >> 1) According to the this url (1), HBase performs well for two or three >> column families. Why is it so? > >Frist, each column family

Re: Disk Seeks and Column families

2012-01-21 Thread Doug Meil
One other "big picture" comment: Hbase scales by having lots of servers, and servers with multiple drives. While single-read performance is obviously important, there is more to Hbase than a single-server RDBMS drag-race comparison. It's a distributed architecture (as with MapReduce). re: "hba

Re: Disk Seeks and Column families

2012-01-21 Thread yuzhihong
Have you considered using AggregationProtocol to perform aggregation ? Thanks On Jan 20, 2012, at 11:08 PM, Praveen Sripati wrote: > Hi, > > 1) According to the this url (1), HBase performs well for two or three > column families. Why is it so? > > 2) Dump of a HFile, looks like below. The

Re: Disk Seeks and Column families

2012-01-21 Thread Praveen Sripati
Thanks for the response. > The contents of a row stay together like a regular row-oriented database. > K: row-550/colfam1:50/1309813948188/Put/vlen=2 V: 50 > K: row-550/colfam1:50/1309812287166/Put/vlen=2 V: 50 > K: row-551/colfam1:51/1309813948222/Put/vlen=2 V: 51 > K: row-551/colfam1:51/1309812

Re: Disk Seeks and Column families

2012-01-21 Thread Doug Meil
Compression is at the block level within the StoreFile (Hfile), so yes, they can take advantage of compression. On 1/21/12 12:49 PM, "Praveen Sripati" wrote: >Thanks for the response. > >> The contents of a row stay together like a regular row-oriented >>database. > >> K: row-550/colfam1:50/1

Re: Disk Seeks and Column families

2012-01-21 Thread Andrey Stepachev
21 января 2012 г. 19:16 пользователь Doug Meil написал: > > One other "big picture" comment:  Hbase scales by having lots of servers, > and servers with multiple drives. While single-read performance is > obviously important, there is more to Hbase than a single-server RDBMS > drag-race comparison

Re: Disk Seeks and Column families

2012-01-21 Thread M. C. Srivas
Praveen, basically you are correct on all counts. If there are too many columns, HBase will have to issue more disk-seeks to extract only the particular columns you need ... and since the data is laid out horizontally there are fewer common substrings in a single HBase-block and compression qua

Re: Disk Seeks and Column families

2012-01-23 Thread Praveen Sripati
Thanks for the response. I am just getting started with HBase. And before getting into the code/api level details, I am trying to understand the problem area HBase is trying to address through it's architecture/design. 1) So, what are the recommendations for having many columns and with dense data

Re: Disk Seeks and Column families

2012-01-23 Thread Andrey Stepachev
2012/1/24 Praveen Sripati : > Thanks for the response. I am just getting started with HBase. And before > getting into the code/api level details, I am trying to understand the > problem area HBase is trying to address through it's architecture/design. > > 1) So, what are the recommendations for ha

Re: Disk Seeks and Column families

2012-01-23 Thread Andrey Stepachev
2012/1/24 Andrey Stepachev : > 2012/1/24 Praveen Sripati : > > a) As in 1), add something to key. For example each 5 minutes. Later your > can issue 16 queries and merge them (for realtime) eah... 3 minutes :) -- Andrey.

Re: Disk Seeks and Column families

2012-01-24 Thread Jason Frantz
On Tue, Jan 24, 2012 at 11:45 AM, Praveen Sripati wrote: > Thanks for the response. I am just getting started with HBase. And before > getting into the code/api level details, I am trying to understand the > problem area HBase is trying to address through it's architecture/design. > > 1) So, what