Multiple column families - scan performance

ps0618 Mon, 14 Aug 2017 21:10:51 -0700

1
down vote
favorite
I have 2 HBase tables - one with a single column family, and other has 4
column families. Both tables are keyed by same rowkey, and the column
families all have a single column qualifier each, with a json string as
value (each json payload is about 10-20K in size). All column families use
fast-diff encoding and gzip compression.


After loading about 60MM rows to each table, a scan test on (any) single
column family in the 2nd table takes 4x the time to scan the single column
family from the 1st table. In both cases, the scanner is bounded by a start
and stop key to scan 1MM rows. Performance did not change much even after
running a major compaction on both tables.

Though HBase doc and other tech forums recommend not using more than 1
column family per table, nothing I have read so far suggests scan
performance will linearly degrade based on number of column families. Has
anyone else experienced this, and is there a simple explanation for this?

To note, the reason second table has 4 column families is even though I only
scan one column family at a time now, there are requirements to scan
multiple column families from that table given a set of rowkeys.

Thanks for any insight into the performance question.



--
View this message in context: 
http://apache-hbase.679495.n3.nabble.com/Multiple-column-families-scan-performance-tp4089733.html
Sent from the HBase User mailing list archive at Nabble.com.

Multiple column families - scan performance

Reply via email to