Sorry, accidentally hit send. I'm guessing a 10 minute time slice would drop their space savings from 4-8x down to 2-4x. On Aug 27, 2013 11:30 PM, "Chris Perluss" <tradersan...@gmail.com> wrote:
> I'm still kinda new to HBase so please excuse me if I am wrong. I suspect > the reason has to do with a different slide from their presentation where > they run a job every hour to combine all the cells from the previous hour > into one cell. > > OpenTSDB has quite a long row key. It contains the metric name, the > timestamp, and numerous optional tags. If you wrote one metric every second > then you would write 3600 columns per row key. Since the row key is very > long, it uses quite a bit of space to store the same row key 3600 times. > By combining an hours worth of data into one cell OpenTMS claims they save > 4-8x of their storage. > > If they stayed with their original 10 minute time slice then they would > have to store their giant row key 6 times per hour instead of once. I'm > going to guess this > On Aug 27, 2013 10:50 PM, "林煒清" <thesuperch...@gmail.com> wrote: > >> *Context*: >> >> Recently, I see openTSDB having their rows packed by period, thus end in >> ten to hundred columns per row. It claim that this design performs more >> efficient for row seeking.(on slide:Lessons learned from openTSDB) >> >> *My argument*: >> >> If *a block of HFile *is indexed by the start key of itself, which the >> key >> is made of {row, cf, cq} , then I think read time for the specific Key >> should be the same for all tall-or-wide table case, since the physical >> storage is sorted by key, not only by rowkey. >> >> So that under one column family the rowkey+column is a key as a whole, >> shift a part of the rowkey to the column is the same as shift a part of >> rowkey to the tail of the rowkey, vice versa. >> >> Follow this logic , under physical view the openTSDB did is just change >> key >> index by shifting a portion of timestamp bytes to position behind rowkey, >> that is column qualifier. >> >> *Question*: >> >> 1.When getting (get is a special scan, right?) a packed row worth of one >> hour, or scan over one hour range of rows, I don't see there could any >> performance improvement. So why openTSDB says packed row have better >> performance for row seeking? >> >> 2.Almost every doc & books all recommend tall table design and especially >> at book "HBase in Action", it says that ,among others, the consideration >> of >> reading performance is the reason why tall is adopting, though I still >> can't get it why? >> >> 3.Also that the KeyValues inside a block is searched by *linear* scan, and >> start key of blocks is by binary search , right? >> >> any hint is much appreciated. >> >