Re: Using 5-6 bytes for cassandra timestamps vs 8…

Ertio Lew Thu, 19 Jan 2012 00:13:09 -0800

It wont obviously matter in case your columns are fat but in several cases,
(at least I could think of several cases) where you need to, for example,
just store an integer column name & empty column value. Thus 12 bytes for
the column where 8 bytes is just the overhead to store timestamps doesn't
look very nice. And skinny columns is a very common use-case, I believe.


On Thu, Jan 19, 2012 at 1:26 PM, Maxim Potekhin <potek...@bnl.gov> wrote:

> I must have accidentally deleted all messages in this thread save this one.
>
> On the face value, we are talking about saving 2 bytes per column. I know
> it can add up with many columns, but relative to the size of the column --
> is it THAT significant?
>
> I made an effort to minimize my CF footprint by replacing the "natural"
> column keys with integers (and translating back and forth when writing and
> reading). It's easy to see that in my case I achieve almost 50% storage
> savings and at least 30%. But if the column in question contains more than
> 20 bytes -- what's up with trying to save 2?
>
> Cheers
>
> Maxim
>
>
>
> On 1/18/2012 11:49 PM, Ertio Lew wrote:
>
>> I believe the timestamps *on per column basis* are only required until
>> the compaction time after that it may also work if the timestamp range
>> could be specified globally on per SST table basis. and thus the
>> timestamps until compaction are only required to be measure the time
>> from the initialization of the new memtable to the point the column is
>> written to that memtable. Thus you can easily fit that time in 4
>> bytes. This I believe would save atleast  4 bytes overhead for each
>> column.
>>
>> Is anything related to these overheads under consideration/ or planned
>> in the roadmap ?
>>
>>
>>
>> On Tue, Sep 6, 2011 at 11:44 AM, Oleg 
>> Anastastasyev<oleganas@gmail.**com<olega...@gmail.com>>
>>  wrote:
>>
>>> I have a patch for trunk which I just have to get time to test a bit
>>>> before I
>>>>
>>> submit.
>>>
>>>> It is for super columns and will use the super columns timestamp as the
>>>> base
>>>>
>>> and only store variant encoded offsets in the underlying columns.
>>> Could you please measure how much real benefit it brings (in real RAM
>>> consumption by JVM). It is hard to tell will it give noticeable results
>>> or not.
>>> AFAIK memory structures used for memtable consume much more memory. And
>>> 64-bit
>>> JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
>>> consumption reduction looks doubtful.
>>>
>>>
>>>
>

Re: Using 5-6 bytes for cassandra timestamps vs 8…

Reply via email to