Mina, 
        That does not sound right. 

        If you have the time can you create a jira ticket describing the 
problem, please include:

* the GC logs gathered by enabling them here 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165 (It 
would be good to see the node get into trouble if possible).
* OS, JVM and cassandra versions
* information on the schema and workload
* anything else you think is important. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 7:24 AM, Mina Naguib wrote:

> 
> Hi Wade
> 
> I don't know if your scenario matches mine, but I've been struggling with 
> memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
> with enabling compression and levelled compactions, so I don't know which 
> specifically is the main culprit.
> 
> Specifically, all my nodes seem to "lose" heap memory.  As parnew and CMS do 
> their job, over any reasonable period of time, the "floor" of memory after a 
> GC keeps rising.  This is quite visible if you leave jconsole connected for a 
> day or so, and manifests itself as a funny-looking cone like so: 
> http://mina.naguib.ca/images/cassandra_jconsole.png
> 
> Once memory pressure reaches a point where the heap can't be maintained 
> reliably below 75%, cassandra goes into survival mode - via a bunch of 
> tunables in cassandra.conf it'll do things like flush memtables, drop caches, 
> etc - all of which, in my experience, especially with the recent off-heap 
> data structures, exasperate the problem.
> 
> I've been meaning, of course, to collect enough technical data to file a bug 
> report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
> improves the situation.
> 
> What I have found however, is a band-aid which you see at the rightmost 
> section of the graph in the screenshot I posted.  That is simply to hit 
> "Perform GC" button in jconsole.  It seems that a full System.gc() *DOES* 
> reclaim heap memory that parnew and CMS fail to reclaim.
> 
> On my production cluster I have a full-GC via JMX scheduled in a rolling 
> fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
> unresponsiveness) but is a necessary evil in my situation.  Without it, my 
> nodes enter a nasty spiral of constant flushing, constant compactions, high 
> heap usage, instability and high latency.
> 
> 
> On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
> 
>> Alas, upgrading to 1.1.1 did not solve my issue.
>> 
>> -----Original Message-----
>> From: Brandon Williams [mailto:dri...@gmail.com] 
>> Sent: Monday, June 04, 2012 11:24 PM
>> To: user@cassandra.apache.org
>> Subject: Re: memory issue on 1.1.0
>> 
>> Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741
>> 
>> -Brandon
>> 
>> On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
>> <wade.l.poziom...@intel.com> wrote:
>>> Running a very write intensive (new column, delete old column etc.) process 
>>> and failing on memory.  Log file attached.
>>> 
>>> Curiously when I add new data I have never seen this have in past sent 
>>> hundreds of millions "new" transactions.  It seems to be when I 
>>> modify.  my process is as follows
>>> 
>>> key slice to get columns to modify in batches of 100, in separate threads 
>>> modify those columns.  I advance the slice with the start key each with 
>>> last key in previous batch.  Mutations done are update a column value in 
>>> one column family(token), delete column and add new column in another (pan).
>>> 
>>> Runs well until after about 5 million rows then it seems to run out of 
>>> memory.  Note that these column families are quite small.
>>> 
>>> WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
>>> 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
>>> and/or cache sizes.  Cassandra will now flush up to the two largest 
>>> memtables to free up memory.  Adjust flush_largest_memtables_at 
>>> threshold in cassandra.yaml if you don't want Cassandra to do this 
>>> automatically
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
>>> (line 2772) Unable to reduce heap usage since there are no dirty 
>>> column families
>>> INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
>>> InetAddress /10.230.34.170 is now UP
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
>>> (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
>>> max is 8506048512
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
>>> (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
>>> 5714800208 used; max is 8506048512
>>> 
>>> ----------------
>>> Keyspace: keyspace
>>>       Read Count: 50042632
>>>       Read Latency: 0.23157864418482224 ms.
>>>       Write Count: 44948323
>>>       Write Latency: 0.019460829472992797 ms.
>>>       Pending Tasks: 0
>>>               Column Family: pan
>>>               SSTable count: 5
>>>               Space used (live): 1977467326
>>>               Space used (total): 1977467326
>>>               Number of Keys (estimate): 16334848
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 74
>>>               Read Count: 14985122
>>>               Read Latency: 0.408 ms.
>>>               Write Count: 19972441
>>>               Write Latency: 0.022 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 829
>>>               Bloom Filter False Ratio: 0.00073
>>>               Bloom Filter Space Used: 37048400
>>>               Compacted row minimum size: 125
>>>               Compacted row maximum size: 149
>>>               Compacted row mean size: 149
>>> 
>>>               Column Family: token
>>>               SSTable count: 4
>>>               Space used (live): 1250973873
>>>               Space used (total): 1250973873
>>>               Number of Keys (estimate): 14217216
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 49
>>>               Read Count: 30059563
>>>               Read Latency: 0.167 ms.
>>>               Write Count: 14985488
>>>               Write Latency: 0.014 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 13642
>>>               Bloom Filter False Ratio: 0.00322
>>>               Bloom Filter Space Used: 28002984
>>>               Compacted row minimum size: 150
>>>               Compacted row maximum size: 258
>>>               Compacted row mean size: 224
>>> 
>>>               Column Family: counters
>>>               SSTable count: 2
>>>               Space used (live): 561549994
>>>               Space used (total): 561549994
>>>               Number of Keys (estimate): 9985024
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 38
>>>               Read Count: 4997947
>>>               Read Latency: 0.092 ms.
>>>               Write Count: 9990394
>>>               Write Latency: 0.023 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 191
>>>               Bloom Filter False Ratio: 0.37525
>>>               Bloom Filter Space Used: 18741152
>>>               Compacted row minimum size: 125
>>>               Compacted row maximum size: 179
>>>               Compacted row mean size: 150
>>> 
>>> ----------------
> 

Reply via email to