2011/4/23 Panayotis Antonopoulos <[email protected]>: > > I am also a beginner, so I would like to ask you something about the method > you proposed. > HBase is column-oriented. This means (as far as I know from databases) that > it stores its data column by column and not row by row.
Fortunately, this is an oversimplification. HBase has data efficiently accessible by row. Strictly speaking, it is not even a column-oriented database. It's a column-family-oriented database. From the docs: "Physically they are stored on a per-column family basis." > If we use the schema you suggested then when we want some of the documents > for a single word we will have to access many columns and I think this will > cost as a lot. No, it is very efficient, even more so if you access columns from a single column family only. AFAIK, there is no way to access HBase by-column only, without being in the context of a dedicated row. > I think that the locality of the data is lost using this schema. No, I don't think so. > I repeat that I am a beginner so please correct me if I am wrong. This presentation might help: http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies Bernd > > Regards, > Panagiotis. > >> Date: Sat, 23 Apr 2011 11:25:47 +0200 >> Subject: Re: HBase - Column family >> From: [email protected] >> To: [email protected] >> >> That's how I would do it: >> What's nice in HBase is that you can store all the data for one of >> your keywords in a single row. >> Create a column family "doc_id". >> Now, for each word, you create one row. >> In this row, for each matching document you create one column (that's >> the gotcha compared to a RDB design). >> The name of the column is the doc id. The column's cell content is the >> weight. >> >> So, following your example you'd get: >> >> row id | column-family:column.... >> HELLO | doc_id:2 | doc_id:3 | doc_id:4 >> >> and column values: >> doc_id:2 | doc_id:3 | doc_id:4 >> 12 | 45 | 36 >> >> HTH, >> >> Bernd >> >> >> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <[email protected]> wrote: >> > Hi, I'm a beginner in HBase. I need to design my table. I want to play >> > with the >> > following information: >> > >> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the >> > weight of >> > each doc is 12,45,36 - My raw data: doc:D title:'i like >> > potatoes',weight:W,date:D >> > >> > I created a table with, row: word, column:date, value:doc But I can't store >> > multiple row with the same date, for the same word. >> > >> > Can we create multiple column families for a table? What can be the best >> > way to >> > design the schema? >> > >> > Thanks a lot >> > >> > >
