Ok - thanks but maybe some kernel guy can help or point to some good resource to get educated because I don'r really get it.
The following is from our other small log cluster with 2 nodes with 8GM ram cassandra has 4GB max heap - We have disabled swap on all cassandra servers - On the machine were I got the system oom (not java oom) I look at dmesg and see In the normal zone: free:6604kB show we hava a problem unevictable:4176332kB because we have jna and page cache info (DMA32 zone): active_anon:3118624kB active_file:0kB If I understand it correctly active_anon and _file correspond to file backed and non file backed pages. Question 1: So if the res memory thats non-heap is actually mmap'd files shouldn't they show up in active_file? - I compare /proc/meminfo's of the node that was restarted and the other one that still survived Active(anon) on the restarted server is ~100MB on the other its > 1GB. Question 2: Could anyone point me to some resource which explains how the file system cache usage of the page cache and process usage of page cache are orchestrated? I understand that the swap daemon only checks the page cache if the number of free pages is getting low. So if res memory is used up by the java process (which is not controllable by -Xmx settings) it seems to compete with the file system cache. Wow is memory usage optimized so that the right parts of file are in mem? Thanks, Daniel snip from: dmesg [7226178.039658] Node 0 DMA32 free:23420kB min:4816kB low:6020kB high:7224kB active_anon:3118624kB inactive_anon:20048kB active_file:0kB inactive_file:764kB unevictable:268156kB isolated(anon):0kB isolated(file):0kB present:3463804kB mlocked:268156kB dirty:0kB writeback:32kB mapped:1944kB shmem:120kB slab_reclaimable:1124kB slab_unreclaimable:1740kB kernel_stack:1368kB pagetables:9636kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:160 all_unreclaimable? yes [7226178.039672] Node 0 Normal free:6604kB min:6652kB low:8312kB high:9976kB active_anon:412072kB inactive_anon:68800kB active_file:100kB inactive_file:340kB unevictable:4176332kB isolated(anon):0kB isolated(file):0kB present:4783356kB mlocked:4176332kB dirty:0kB writeback:28kB mapped:13600kB shmem:1376kB slab_reclaimable:5052kB slab_unreclaimable:12172kB kernel_stack:1400kB pagetables:11156kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:747 all_unreclaimable? yes [7226178.039682] lowmem_reserve[]: 0 0 0 0 [7226178.039686] Node 0 DMA: 2*4kB 2*8kB 1*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15816kB [7226178.039696] Node 0 DMA32: 214*4kB 166*8kB 165*16kB 77*32kB 42*64kB 30*128kB 14*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 23544kB [7226178.039705] Node 0 Normal: 731*4kB 3*8kB 0*16kB 2*32kB 2*64kB 1*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 6852kB [7226178.039714] 4707 total pagecache pages [7226178.039715] 0 pages in swap cache [7226178.039717] Swap cache stats: add 0, delete 0, find 0/0 [7226178.039719] Free swap = 0kB [7226178.039720] Total swap = 0kB [7226178.064611] 2097135 pages RAM [7226178.064614] 47927 pages reserved [7226178.064615] 14100 pages shared [7226178.064616] 2031896 pages non-shared [7226178.064620] Out of memory: kill process 11670 (java) score 13723 or a child On Jul 4, 2011, at 2:42 PM, Jonathan Ellis wrote: > mmap'd data will be attributed to res, but the OS can page it out > instead of killing the process. > > On Mon, Jul 4, 2011 at 5:52 AM, Daniel Doubleday > <daniel.double...@gmx.net> wrote: >> Hi all, >> we have a mem problem with cassandra. res goes up without bounds (well until >> the os kills the process because we dont have swap) >> I found a thread that's about the same problem but on OpenJDK: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Very-high-memory-utilization-not-caused-by-mmap-on-sstables-td5840777.html >> We are on Debian with Sun JDK. >> Resident mem is 7.4G while heap is restricted to 3G. >> Anyone else is seeing this with Sun JDK? >> Cheers, >> Daniel >> :/home/dd# java -version >> java version "1.6.0_24" >> Java(TM) SE Runtime Environment (build 1.6.0_24-b07) >> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) >> :/home/dd# ps aux |grep java >> cass 28201 9.5 46.8 372659544 7707172 ? SLl May24 5656:21 >> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 >> -Xms3000M -Xmx3000M -Xmn400M ... >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> >> >> 28201 cass 20 0 355g 7.4g 1.4g S 8 46.9 5656:25 java >> >> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com