2011/4/23 Panayotis Antonopoulos <[email protected]>:
>
> I am also a beginner, so I would like to ask you something about the method 
> you proposed.
> HBase is column-oriented. This means (as far as I know from databases) that 
> it stores its data column by column and not row by row.

Fortunately, this is an oversimplification. HBase has data efficiently
accessible by row. Strictly speaking, it is not even a column-oriented
database. It's a column-family-oriented database. From the docs:
"Physically they are stored on a per-column family basis."

> If we use the schema you suggested then when we want some of the documents 
> for a single word we will have to access many columns and I think this will 
> cost as a lot.

No, it is very efficient, even more so if you access columns from a
single column family only.
AFAIK, there is no way to access HBase by-column only, without being
in the context of a dedicated row.

> I think that the locality of the data is lost using this schema.

No, I don't think so.

> I repeat that I am a beginner so please correct me if I am wrong.

This presentation might help:
http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies

  Bernd

>
> Regards,
> Panagiotis.
>
>> Date: Sat, 23 Apr 2011 11:25:47 +0200
>> Subject: Re: HBase - Column family
>> From: [email protected]
>> To: [email protected]
>>
>> That's how I would do it:
>> What's nice in HBase is that you can store all the data for one of
>> your keywords in a single row.
>> Create a column family "doc_id".
>> Now, for each word, you create one row.
>> In this row, for each matching document you create one column (that's
>> the gotcha compared to a RDB design).
>> The name of the column is the doc id. The column's cell content is the 
>> weight.
>>
>> So, following your example you'd get:
>>
>> row id | column-family:column....
>> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
>>
>> and column values:
>> doc_id:2 | doc_id:3 | doc_id:4
>> 12 | 45 | 36
>>
>> HTH,
>>
>>   Bernd
>>
>>
>> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <[email protected]> wrote:
>> > Hi, I'm a beginner in HBase. I need to design my table. I want to play 
>> > with the
>> > following information:
>> >
>> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the 
>> > weight of
>> > each doc is 12,45,36 - My raw data: doc:D title:'i like 
>> > potatoes',weight:W,date:D
>> >
>> > I created a table with, row: word, column:date, value:doc But I can't store
>> > multiple row with the same date, for the same word.
>> >
>> > Can we create multiple column families for a table? What can be the best 
>> > way to
>> > design the schema?
>> >
>> > Thanks a lot
>> >
>> >
>

Reply via email to