On Mon, Oct 11, 2010 at 9:13 AM, Ran Tavory <ran...@gmail.com> wrote:

> In my production cluster I've been seeing the following pattern.
> When a node goes up it operates smoothly for a few days but then, after a
> few days the node start to show excessive CPU usage, I see GC activity (and
> it may also be excessive, not sure) and sometimes the node is dropped out of
> the ring due to unresponsiveness. If I let things go untouched for a few
> more days eventually all nodes in the cluster end up serving slow reads and
> clients start timing out. A node restart solves the problem for a few more
> days.
> So far the only trigger for this behavior is "number of days that have
> passed since last node restart",  usually it's 4-5 days. One easy solution
> is to restart nodes every couple of days but that's lame...
>
*snip*

>       <ColumnFamily CompareWith="BytesType" Name="KvAds"
>                     KeysCached="0"
>                     RowsCached="10000000"/>
>

That's a large row cache, it might be exerting heavy GC pressure on the JVM.
 Try setting the key cache to 100% and disabling the row cache instead.

-Brandon

Reply via email to