Thank you all.

Facts learned:

- Having 130 column families is too much. Don't do that.
- While scanning, an entire row will be read for filtering, unless HBASE-5416 
technique is applied which makes only relevant column family is loaded. (But it 
seems that still one can't load just a column needed while scanning)
- Big row size is maybe not good.

Currently it seems appropriate to follow the one-column solution that Alok 
Singh suggested, in part since currently there is no reasonable grouping of the 
fields.

Here is my current thinking:

- One column family, one column. Field name will be included in rowkey.
- Eliminate filtering altogether (in most case) by properly ordering rowkey 
components.
- If a filtering is absolutely needed, add a 'dummy' column family and apply 
HBASE-5416 technique to minimize disk read, since the field value can be 
large(~5MB). (This dummy column thing may not be right, I'm not sure, since I 
have not read the filtering section of the book I'm reading yet)

Hope that I am not missing or misunderstanding something...
(I'm a total newbie. I've started to read a HBase book since last week...)





Reply via email to