Plus, the size of the value of each field can be ~5MB, since max 250000 lines of the source data will be merged into one record, to match the request pattern.
-----Original Message----- From: innowireless TaeYun Kim [mailto:taeyun....@innowireless.co.kr] Sent: Tuesday, August 05, 2014 8:11 PM To: user@hbase.apache.org Subject: Question on the number of column families Hi, According to http://hbase.apache.org/book/number.of.cfs.html, having more than 2~3 column families are strongly discouraged. BTW, in my case, records on a table have the following characteristics: - The table is read-only. It is bulk-loaded once. When a new data is ready, A new table is created and the old table is deleted. - The size of the source data can be hundreds of gigabytes. - A record has about 130 fields. - The number of fields in a record is fixed. - The names of the fields are also fixed. (it's like a table in RDBMS) - About 40(it varies) fields mostly have value, while other fields are mostly empty(null in RDBMS). - It is unknown which field will be dense. It depends on the source data. - Fields are accessed independently. Normally a user requests just one field. A user can request several fields. - The range on the range query is the same for all fields. (No wider, no narrower, regardless the data density) For me, it seems that it would be more efficient if there is one column family for each field, since it would cost less disk I/O, for only the needed column data will be read. Can the table have 130 column families for this case? Or the whole columns must be in one column family? Thanks.