It wont obviously matter in case your columns are fat but in several cases, (at least I could think of several cases) where you need to, for example, just store an integer column name & empty column value. Thus 12 bytes for the column where 8 bytes is just the overhead to store timestamps doesn't look very nice. And skinny columns is a very common use-case, I believe.
On Thu, Jan 19, 2012 at 1:26 PM, Maxim Potekhin <potek...@bnl.gov> wrote: > I must have accidentally deleted all messages in this thread save this one. > > On the face value, we are talking about saving 2 bytes per column. I know > it can add up with many columns, but relative to the size of the column -- > is it THAT significant? > > I made an effort to minimize my CF footprint by replacing the "natural" > column keys with integers (and translating back and forth when writing and > reading). It's easy to see that in my case I achieve almost 50% storage > savings and at least 30%. But if the column in question contains more than > 20 bytes -- what's up with trying to save 2? > > Cheers > > Maxim > > > > On 1/18/2012 11:49 PM, Ertio Lew wrote: > >> I believe the timestamps *on per column basis* are only required until >> the compaction time after that it may also work if the timestamp range >> could be specified globally on per SST table basis. and thus the >> timestamps until compaction are only required to be measure the time >> from the initialization of the new memtable to the point the column is >> written to that memtable. Thus you can easily fit that time in 4 >> bytes. This I believe would save atleast 4 bytes overhead for each >> column. >> >> Is anything related to these overheads under consideration/ or planned >> in the roadmap ? >> >> >> >> On Tue, Sep 6, 2011 at 11:44 AM, Oleg >> Anastastasyev<oleganas@gmail.**com<olega...@gmail.com>> >> wrote: >> >>> I have a patch for trunk which I just have to get time to test a bit >>>> before I >>>> >>> submit. >>> >>>> It is for super columns and will use the super columns timestamp as the >>>> base >>>> >>> and only store variant encoded offsets in the underlying columns. >>> Could you please measure how much real benefit it brings (in real RAM >>> consumption by JVM). It is hard to tell will it give noticeable results >>> or not. >>> AFAIK memory structures used for memtable consume much more memory. And >>> 64-bit >>> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory >>> consumption reduction looks doubtful. >>> >>> >>> >