Upgrade java (version 1.6.21 have memleaks) to latest 1.6.32. Its abnormally that on 80Gigs you have 15Gigs of index
vfs_cache_pressure - used for inodes and dentrys Also to check that you have memleaks use drop_cache sysctl 2012/6/14 Gurpreet Singh <gurpreet.si...@gmail.com>: > JNA is installed. swappiness was 0. vfs_cache_pressure was 100. 2 questions > on this.. > 1. Is there a way to find out if mlockall really worked other than just the > mlockall successful log message? > 2. Does cassandra only mlock the jvm heap or also the mmaped memory? > > I disabled mmap completely, and things look so much better. > latency is surprisingly half of what i see when i have mmap enabled. > Its funny that i keep reading tall claims abt mmap, but in practise a lot of > ppl have problems with it, especially when it uses up all the memory. We > have tried mmap for different purposes in our company before,and had finally > ended up disabling it, because it just doesnt handle things right when > memory is low. Maybe the proc/sys/vm needs to be configured right, but thats > not the easiest of configurations to get right. > > Right now, i am handling only 80 gigs of data. kernel version is 2.6.26. > java version is 1.6.21 > /G > > > On Wed, Jun 13, 2012 at 8:42 PM, Al Tobey <a...@ooyala.com> wrote: >> >> I would check /etc/sysctl.conf and get the values of >> /proc/sys/vm/swappiness and /proc/sys/vm/vfs_cache_pressure. >> >> If you don't have JNA enabled (which Cassandra uses to fadvise) and >> swappiness is at its default of 60, the Linux kernel will happily swap out >> your heap for cache space. Set swappiness to 1 or 'swapoff -a' and kswapd >> shouldn't be doing much unless you have a too-large heap or some other app >> using up memory on the system. >> >> >> On Wed, Jun 13, 2012 at 11:30 AM, ruslan usifov <ruslan.usi...@gmail.com> >> wrote: >>> >>> Hm, it's very strange what amount of you data? You linux kernel >>> version? Java version? >>> >>> PS: i can suggest switch diskaccessmode to standart in you case >>> PS:PS also upgrade you linux to latest, and javahotspot to 1.6.32 >>> (from oracle site) >>> >>> 2012/6/13 Gurpreet Singh <gurpreet.si...@gmail.com>: >>> > Alright, here it goes again... >>> > Even with mmap_index_only, once the RES memory hit 15 gigs, the read >>> > latency >>> > went berserk. This happens in 12 hours if diskaccessmode is mmap, abt >>> > 48 hrs >>> > if its mmap_index_only. >>> > >>> > only reads happening at 50 reads/second >>> > row cache size: 730 mb, row cache hit ratio: 0.75 >>> > key cache size: 400 mb, key cache hit ratio: 0.4 >>> > heap size (max 8 gigs): used 6.1-6.9 gigs >>> > >>> > No messages about reducing cache sizes in the logs >>> > >>> > stats: >>> > vmstat 1 : no swapping here, however high sys cpu utilization >>> > iostat (looks great) - avg-qu-sz = 8, avg await = 7 ms, svc time = 0.6, >>> > util >>> > = 15-30% >>> > top - VIRT - 19.8g, SHR - 6.1g, RES - 15g, high cpu, buffers - 2mb >>> > cfstats - 70-100 ms. This number used to be 20-30 ms. >>> > >>> > The value of the SHR keeps increasing (owing to mmap i guess), while at >>> > the >>> > same time buffers keeps decreasing. buffers starts as high as 50 mb, >>> > and >>> > goes down to 2 mb. >>> > >>> > >>> > This is very easily reproducible for me. Every time the RES memory hits >>> > abt >>> > 15 gigs, the client starts getting timeouts from cassandra, the sys cpu >>> > jumps a lot. All this, even though my row cache hit ratio is almost >>> > 0.75. >>> > >>> > Other than just turning off mmap completely, is there any other >>> > solution or >>> > setting to avoid a cassandra restart every cpl of days. Something to >>> > keep >>> > the RES memory to hit such a high number. I have been constantly >>> > monitoring >>> > the RES, was not seeing issues when RES was at 14 gigs. >>> > /G >>> > >>> > On Fri, Jun 8, 2012 at 10:02 PM, Gurpreet Singh >>> > <gurpreet.si...@gmail.com> >>> > wrote: >>> >> >>> >> Aaron, Ruslan, >>> >> I changed the disk access mode to mmap_index_only, and it has been >>> >> stable >>> >> ever since, well at least for the past 20 hours. Previously, in abt >>> >> 10-12 >>> >> hours, as soon as the resident memory was full, the client would start >>> >> timing out on all its reads. It looks fine for now, i am going to let >>> >> it >>> >> continue to see how long it lasts and if the problem comes again. >>> >> >>> >> Aaron, >>> >> yes, i had turned swap off. >>> >> >>> >> The total cpu utilization was at 700% roughly.. It looked like kswapd0 >>> >> was >>> >> using just 1 cpu, but cassandra (jsvc) cpu utilization increased quite >>> >> a >>> >> bit. top was reporting high system cpu, and low user cpu. >>> >> vmstat was not showing swapping. java heap size max is 8 gigs. while >>> >> only >>> >> 4 gigs was in use, so java heap was doing great. no gc in the logs. >>> >> iostat >>> >> was doing ok from what i remember, i will have to reproduce the issue >>> >> for >>> >> the exact numbers. >>> >> >>> >> cfstats latency had gone very high, but that is partly due to high cpu >>> >> usage. >>> >> >>> >> One thing was clear, that the SHR was inching higher (due to the mmap) >>> >> while buffer cache which started at abt 20-25mb reduced to 2 MB by the >>> >> end, >>> >> which probably means that pagecache was being evicted by the kswapd0. >>> >> Is >>> >> there a way to fix the size of the buffer cache and not let system >>> >> evict it >>> >> in favour of mmap? >>> >> >>> >> Also, mmapping data files would basically cause not only the data >>> >> (asked >>> >> for) to be read into main memory, but also a bunch of extra pages >>> >> (readahead), which would not be very useful, right? The same thing for >>> >> index >>> >> would actually be more useful, as there would be more index entries in >>> >> the >>> >> readahead part.. and the index files being small wouldnt cause memory >>> >> pressure that page cache would be evicted. mmapping the data files >>> >> would >>> >> make sense if the data size is smaller than the RAM or the hot data >>> >> set is >>> >> smaller than the RAM, otherwise just the index would probably be a >>> >> better >>> >> thing to mmap, no?. In my case data size is 85 gigs, while available >>> >> RAM is >>> >> 16 gigs (only 8 gigs after heap). >>> >> >>> >> /G >>> >> >>> >> >>> >> On Fri, Jun 8, 2012 at 11:44 AM, aaron morton >>> >> <aa...@thelastpickle.com> >>> >> wrote: >>> >>> >>> >>> Ruslan, >>> >>> Why did you suggest changing the disk_access_mode ? >>> >>> >>> >>> Gurpreet, >>> >>> I would leave the disk_access_mode with the default until you have a >>> >>> reason to change it. >>> >>> >>> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured >>> >>> >>> >>> is swap disabled ? >>> >>> >>> >>>> Gradually, >>> >>>> > the system cpu becomes high almost 70%, and the client starts >>> >>>> > getting >>> >>>> > continuous timeouts >>> >>> >>> >>> 70% of one core or 70% of all cores ? >>> >>> Check the server logs, is there GC activity ? >>> >>> check nodetool cfstats to see the read latency for the cf. >>> >>> >>> >>> Take a look at vmstat to see if you are swapping, and look at iostats >>> >>> to >>> >>> see if io is the problem >>> >>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html >>> >>> >>> >>> Cheers >>> >>> >>> >>> ----------------- >>> >>> Aaron Morton >>> >>> Freelance Developer >>> >>> @aaronmorton >>> >>> http://www.thelastpickle.com >>> >>> >>> >>> On 8/06/2012, at 9:00 PM, Gurpreet Singh wrote: >>> >>> >>> >>> Thanks Ruslan. >>> >>> I will try the mmap_index_only. >>> >>> Is there any guideline as to when to leave it to auto and when to use >>> >>> mmap_index_only? >>> >>> >>> >>> /G >>> >>> >>> >>> On Fri, Jun 8, 2012 at 1:21 AM, ruslan usifov >>> >>> <ruslan.usi...@gmail.com> >>> >>> wrote: >>> >>>> >>> >>>> disk_access_mode: mmap?? >>> >>>> >>> >>>> set to disk_access_mode: mmap_index_only in cassandra yaml >>> >>>> >>> >>>> 2012/6/8 Gurpreet Singh <gurpreet.si...@gmail.com>: >>> >>>> > Hi, >>> >>>> > I am testing cassandra 1.1 on a 1 node cluster. >>> >>>> > 8 core, 16 gb ram, 6 data disks raid0, no swap configured >>> >>>> > >>> >>>> > cassandra 1.1.1 >>> >>>> > heap size: 8 gigs >>> >>>> > key cache size in mb: 800 (used only 200mb till now) >>> >>>> > memtable_total_space_in_mb : 2048 >>> >>>> > >>> >>>> > I am running a read workload.. about 30 reads/second. no writes at >>> >>>> > all. >>> >>>> > The system runs fine for roughly 12 hours. >>> >>>> > >>> >>>> > jconsole shows that my heap size has hardly touched 4 gigs. >>> >>>> > top shows - >>> >>>> > SHR increasing slowly from 100 mb to 6.6 gigs in these 12 hrs >>> >>>> > RES increases slowly from 6 gigs all the way to 15 gigs >>> >>>> > buffers are at a healthy 25 mb at some point and that goes down >>> >>>> > to 2 >>> >>>> > mb in >>> >>>> > these 12 hrs >>> >>>> > VIRT stays at 85 gigs >>> >>>> > >>> >>>> > I understand that SHR goes up because of mmap, RES goes up because >>> >>>> > it >>> >>>> > is >>> >>>> > showing SHR value as well. >>> >>>> > >>> >>>> > After around 10-12 hrs, the cpu utilization of the system starts >>> >>>> > increasing, >>> >>>> > and i notice that kswapd0 process starts becoming more active. >>> >>>> > Gradually, >>> >>>> > the system cpu becomes high almost 70%, and the client starts >>> >>>> > getting >>> >>>> > continuous timeouts. The fact that the buffers went down from 20 >>> >>>> > mb to >>> >>>> > 2 mb >>> >>>> > suggests that kswapd0 is probably swapping out the pagecache. >>> >>>> > >>> >>>> > Is there a way out of this to avoid the kswapd0 starting to do >>> >>>> > things >>> >>>> > even >>> >>>> > when there is no swap configured? >>> >>>> > This is very easily reproducible for me, and would like a way out >>> >>>> > of >>> >>>> > this >>> >>>> > situation. Do i need to adjust vm memory management stuff like >>> >>>> > pagecache, >>> >>>> > vfs_cache_pressure.. things like that? >>> >>>> > >>> >>>> > just some extra information, jna is installed, mlockall is >>> >>>> > successful. >>> >>>> > there >>> >>>> > is no compaction running. >>> >>>> > would appreciate any help on this. >>> >>>> > Thanks >>> >>>> > Gurpreet >>> >>>> > >>> >>>> > >>> >>> >>> >>> >>> >>> >>> >> >>> > >> >> >