Thanks all for the help. I ran the traffic over the weekend surprisingly, my heap was doing OK (around 5.7G of 8G) but GC activity went nuts and dropped the throughput. I will probably increase the number of nodes.
The other interesting thing I noticed was that there were some objects with finalize() methods, this could potentially cause GC issues. On Fri, May 31, 2013 at 1:47 AM, Aiman Parvaiz <ai...@grapheffect.com>wrote: > I believe you should roll out more nodes as a temporary fix to your > problem, 400GB on all nodes means (as correctly mentioned in other mails of > this thread) you are spending more time on GC. Check out the second comment > in this link by Aaron Morton, he says the more than 300GB can be > problematic, though this post is about older version of cassandra but I > believe concept still stands true: > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-it-safe-to-stop-a-read-repair-and-any-suggestion-on-speeding-up-repairs-td6607367.html > > Thanks > > On May 29, 2013, at 9:32 PM, srmore <comom...@gmail.com> wrote: > > Hello, > I am observing that my performance is drastically decreasing when my data > size grows. I have a 3 node cluster with 64 GB of ram and my data size is > around 400GB on all the nodes. I also see that when I re-start Cassandra > the performance goes back to normal and then again starts decreasing after > some time. > > Some hunting landed me to this page > http://wiki.apache.org/cassandra/LargeDataSetConsiderations which talks > about the large data sets and explains that it might be because I am going > through multiple layers of OS cache, but does not tell me how to tune it. > > So, my question is, are there any optimizations that I can do to handle > these large datatasets ? > > and why does my performance go back to normal when I restart Cassandra ? > > Thanks ! > > >