So on the 2nd table, even if there are 4 CFs , while scanning you need only data from single CF. And this under test CF is similar to what u have in the 1st table? I mean same encoding and compression schema and data size? While creating scan for 2nd table how u make? I hope u do Scan s = new Scan(); s.setStartRow s.setStopRow s.addFamily(cf)
Correct? -Anoop- On Thu, Aug 17, 2017 at 4:42 PM, Partha <parthaema...@gmail.com> wrote: > I have 2 HBase tables - one with a single column family, and other has 4 > column families. Both tables are keyed by same rowkey, and the column > families all have a single column qualifier each, with a json string as > value (each json payload is about 10-20K in size). All column families use > fast-diff encoding and gzip compression. > > After loading about 60MM rows to each table, a scan test on (any) single > column family in the 2nd table takes 4x the time to scan the single column > family from the 1st table. In both cases, the scanner is bounded by a start > and stop key to scan 1MM rows. Performance did not change much even after > running a major compaction on both tables. > > Though HBase doc and other tech forums recommend not using more than 1 > column family per table, nothing I have read so far suggests scan > performance will linearly degrade based on number of column families. Has > anyone else experienced this, and is there a simple explanation for this? > > To note, the reason second table has 4 column families is even though I > only scan one column family at a time now, there are requirements to scan > multiple column families from that table given a set of rowkeys. > > Thanks for any insight into the performance question.