Re: Merging the rows of two column families(with similar attributes) into one ??

Ertio Lew Tue, 08 Feb 2011 12:15:08 -0800

Thanks for adding up Benjamin!


On Wed, Feb 9, 2011 at 1:40 AM, Benjamin Coverston
<ben.covers...@datastax.com> wrote:
>
>
> On 2/4/11 11:58 PM, Ertio Lew wrote:
>>
>> Yes, a disadvantage of more no. of CF in terms of memory utilization
>> which I see is: -
>>
>> if some CF is written less often as compared to other CFs, then the
>> memtable would consume space in the memory until it is flushed, this
>> memory space could have been much better used by a CF that's heavily
>> written and read. And if you try to make the thresholds for flush
>> smaller then more compactions would be needed.
>>
>>
> One more disadvantage here is that with CFs that vary widely in the write
> rate you can also end up with fragmented commit logs which in some cases we
> have seen actually fill up the commit log partition. As a consequence one
> thing to consider would be to lower the commit log flush threshold (in
> minutes) to something lower for the column families that do not see heavy
> use.
>
>>
>>
>> On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew<ertio...@gmail.com>  wrote:
>>>
>>> Thanks Tyler !
>>>
>>> I could not fully understand the reason why more no of column families
>>> would mean more memory.. if you have under control parameters like
>>> memtable_throughput&  memtable_operations which are set per column
>>> family basis then you can directly control&  adjust by splitting the
>>> memory space between two CFs in proportion to what you would do in
>>> single CF.
>>> Hence there should be no extra memory consumption for multiple CFs
>>> that have been split from single one??
>>>
>>> Regarding the compactions, I think even if they are more the size of
>>> the SST files to be compacted is smaller as the data has been split
>>> into two.
>>> Then more compactions but smaller too!!
>>>
>>>
>>> Then, provided the same amount of data, how can greater no of column
>>> families could be a bad option(if you split the values of parameters
>>> for memory consumption proportionately) ??
>>>
>>> --
>>> Regards,
>>> Ertio
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs<ty...@datastax.com>  wrote:
>>>>>
>>>>> I read somewhere that more no of column families is not a good idea as
>>>>> it consumes more memory and more compactions to occur
>>>>
>>>> This is primarily true, but not in every case.
>>>>
>>>>> But the caching requirements may be different as they cater to two
>>>>> different features.
>>>>
>>>> This is a great reason to *not* merge them.  Besides the key and row
>>>> caches,
>>>> don't forget about the OS buffer cache.
>>>>
>>>>> Is it recommended to merge these two column families into one ??
>>>>> Thoughts
>>>>> ?
>>>>
>>>> No, this sounds like an anti-pattern to me.  The overhead from having
>>>> two
>>>> separate CFs is not that high.
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> Software Engineer, DataStax
>>>> Maintainer of the pycassa Cassandra Python client library
>>>>
>>>>
>

Re: Merging the rows of two column families(with similar attributes) into one ??

Reply via email to