You may have seen this: http://hbase.apache.org/book.html#schema.smackdown
bq. are part of one column family Are the columns equally likely to be read ? I ask this because you may be able to utilize essential column family feature by separating columns which tend to be more frequently accessed into their own column family. 0.94 is quite old. Any chance of rerunning your benchmark on hbase 1.x ? Thanks On Thu, Sep 10, 2015 at 9:00 AM, Melvin Kanasseril < melvin.kanasse...@sophos.com> wrote: > Hi, > > This probably has come up before but I wanted to know if there is a > recommendation around having tables with all attribute data as separate > columns v/s an approach with most of the attribute data stored as a blob in > a single column and the rest as separate columns(for column filter > searches). I am aware of the limitations with lumping the data into a blob > but was curious to see if there is an improvement on throughput/latency. > > I am leaning towards there not being much of a difference or this being a > micro-optimization not worth the tradeoff but when we ran a set of > benchmarks to test this(on ver 0.94), the hybrid approach with the blob > data seem to show a 10-12% improvement in write throughput for the same > number of client threads with evenly distributed puts over a pre-spit table > on a 12 node cluster. I used Avro for serialization and all the columns > (there are about 40 without the blob column and 10 with it) are part of one > column family. The size of data for a row is around 5 MB before > serialization. Any thoughts whether this is worth pursuing? > > Thanks, > Melvin >