That's how I would do it: What's nice in HBase is that you can store all the data for one of your keywords in a single row. Create a column family "doc_id". Now, for each word, you create one row. In this row, for each matching document you create one column (that's the gotcha compared to a RDB design). The name of the column is the doc id. The column's cell content is the weight.
So, following your example you'd get: row id | column-family:column.... HELLO | doc_id:2 | doc_id:3 | doc_id:4 and column values: doc_id:2 | doc_id:3 | doc_id:4 12 | 45 | 36 HTH, Bernd On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <[email protected]> wrote: > Hi, I'm a beginner in HBase. I need to design my table. I want to play with > the > following information: > > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight > of > each doc is 12,45,36 - My raw data: doc:D title:'i like > potatoes',weight:W,date:D > > I created a table with, row: word, column:date, value:doc But I can't store > multiple row with the same date, for the same word. > > Can we create multiple column families for a table? What can be the best way > to > design the schema? > > Thanks a lot > >
