column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to foster compression. So it means same columns of different rows

Re: Problems starting HBase

2012-08-05 Thread Stack
On Fri, Aug 3, 2012 at 10:48 PM, sk101 lasjdf89a...@devnullmail.com wrote: Hi guys, I've been trying to setup HBase for OpenTSDB for a few days now and am completely stuck. I've gotten .92 running on a virtual machine but I am completely unable to deploy it to a real machine. Firstly, I've

Re: column based or row based storage for HBase?

2012-08-05 Thread Mohit Anchlia
On Sun, Aug 5, 2012 at 6:04 AM, Lin Ma lin...@gmail.com wrote: Hi guys, I am wondering whether HBase is using column based storage or row based storage? - I read some technical documents and mentioned advantages of HBase is using column based storage to store similar data together to

Re: column based or row based storage for HBase?

2012-08-05 Thread lars hofhansl
Hi Lin, HBase stores key - value mappings sorted by key. So it is a key value store. The key has internal structure, for example it starts with a row key. HBase makes extra guarantees about a set of keys that have the same row key (keeps them colocated, allows atomic operations, etc). I tried

more tables or more rows

2012-08-05 Thread Eric Czech
I need to support data that comes from 30+ sources and the structure of that data is consistent across all the sources, but what I'm not clear on is whether or not I should use 30+ tables with roughly the same format or 1 table where the row key reflects the source. Anybody have a strong argument

Re: more tables or more rows

2012-08-05 Thread Mohammad Tariq
Hello sir, Going for a single table with 30+ rows would be a better choice, if the data from all the sources is not very different. Since, you are considering Hbase as your data store, it wouldn't be wise to have several small rows. The major purpose of Hbase is to host very large tables

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
Thank you for the informative reply, Mohit! Some more comments, 1. actually my confusion about column based storage is from the book HBase The Definitive Guide, chapter 1, section the Dawn of Big Data, which draw a picture showing HBase store the same column of all different rows continuously

Re: column based or row based storage for HBase?

2012-08-05 Thread yonghu
In my understanding of column-oriented structure of hbase, the first thing is the term column-oriented. The meaning is that the data which belongs to the same column family stores continuously in the disk. For each column-family, the data is stored as row store. If you want to understand the

Re: column based or row based storage for HBase?

2012-08-05 Thread Lin Ma
Hi Lars, What do you mean a set of keys that have the same row key and colocated? It will be appreciated if you could show an example or provide more information. regards, Lin On Mon, Aug 6, 2012 at 3:42 AM, lars hofhansl lhofha...@yahoo.com wrote: Hi Lin, HBase stores key - value mappings

Re: column based or row based storage for HBase?

2012-08-05 Thread lars hofhansl
A key in HBase looks like this: (rowkey, column family, column, timestamp) HBase will do two things for you: 1. All keys that have the same row key are stored in the same region 2. All keys are sorted (The column family is special in the each column family has it's one store file, but the