Thanks for adding up Benjamin!
On Wed, Feb 9, 2011 at 1:40 AM, Benjamin Coverston <ben.covers...@datastax.com> wrote: > > > On 2/4/11 11:58 PM, Ertio Lew wrote: >> >> Yes, a disadvantage of more no. of CF in terms of memory utilization >> which I see is: - >> >> if some CF is written less often as compared to other CFs, then the >> memtable would consume space in the memory until it is flushed, this >> memory space could have been much better used by a CF that's heavily >> written and read. And if you try to make the thresholds for flush >> smaller then more compactions would be needed. >> >> > One more disadvantage here is that with CFs that vary widely in the write > rate you can also end up with fragmented commit logs which in some cases we > have seen actually fill up the commit log partition. As a consequence one > thing to consider would be to lower the commit log flush threshold (in > minutes) to something lower for the column families that do not see heavy > use. > >> >> >> On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew<ertio...@gmail.com> wrote: >>> >>> Thanks Tyler ! >>> >>> I could not fully understand the reason why more no of column families >>> would mean more memory.. if you have under control parameters like >>> memtable_throughput& memtable_operations which are set per column >>> family basis then you can directly control& adjust by splitting the >>> memory space between two CFs in proportion to what you would do in >>> single CF. >>> Hence there should be no extra memory consumption for multiple CFs >>> that have been split from single one?? >>> >>> Regarding the compactions, I think even if they are more the size of >>> the SST files to be compacted is smaller as the data has been split >>> into two. >>> Then more compactions but smaller too!! >>> >>> >>> Then, provided the same amount of data, how can greater no of column >>> families could be a bad option(if you split the values of parameters >>> for memory consumption proportionately) ?? >>> >>> -- >>> Regards, >>> Ertio >>> >>> >>> >>> >>> >>> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs<ty...@datastax.com> wrote: >>>>> >>>>> I read somewhere that more no of column families is not a good idea as >>>>> it consumes more memory and more compactions to occur >>>> >>>> This is primarily true, but not in every case. >>>> >>>>> But the caching requirements may be different as they cater to two >>>>> different features. >>>> >>>> This is a great reason to *not* merge them. Besides the key and row >>>> caches, >>>> don't forget about the OS buffer cache. >>>> >>>>> Is it recommended to merge these two column families into one ?? >>>>> Thoughts >>>>> ? >>>> >>>> No, this sounds like an anti-pattern to me. The overhead from having >>>> two >>>> separate CFs is not that high. >>>> >>>> -- >>>> Tyler Hobbs >>>> Software Engineer, DataStax >>>> Maintainer of the pycassa Cassandra Python client library >>>> >>>> >