*Context*:

Recently, I see openTSDB having their rows packed by period, thus end in
ten to hundred columns per row. It claim that this design performs more
efficient for row seeking.(on slide:Lessons learned from openTSDB)

*My argument*:

 If *a block of HFile *is indexed by the start key of itself, which the key
is made of {row, cf, cq} , then I think read time for the specific Key
should be the same for all tall-or-wide table case, since the physical
storage is sorted by key, not only by rowkey.

 So that under one column family the rowkey+column is a key as a whole,
shift a part of the rowkey to the column is the same as shift a part of
rowkey to the tail of the rowkey, vice versa.

Follow this logic , under physical view the openTSDB did is just change key
index by shifting a portion of timestamp bytes to position behind rowkey,
that is column qualifier.

*Question*:

1.When getting (get is a special scan, right?) a packed row worth of one
hour, or scan over one hour range of rows, I don't see there could any
performance improvement. So why openTSDB says packed row have better
performance for row seeking?

2.Almost every doc & books all recommend tall table design and especially
at book "HBase in Action", it says that ,among others, the consideration of
reading performance is the reason why tall is adopting, though I still
can't get it why?

3.Also that the KeyValues inside a block is searched by *linear* scan, and
start key of blocks is by binary search , right?

any hint is much appreciated.

Reply via email to