The problem seems still like the C-heap of JVM, which leaks 70MB every day.
Here is the summary:

on 12/19: 00000000010c3000 178548K rw---    [ anon ]
on 12/18: 00000000010c3000 110320K rw---    [ anon ]
on 12/17: 00000000010c3000  39256K rw---    [ anon ]

This should not be the JVM object heap, because the object heap size is
fixed up per the below JVM settings. Here is the map of JVM object heap,
which remains constant.

00000000010c3000  39256K rw---    [ anon ]

I'll paste it to open-jdk mailist to seek for help.

Zhu,
> Couple of quick questions:
>  How many threads are in your JVM?
>

There are hundreds of threads. Here is the settings of Cassandra:
1)  *<ConcurrentReads>8</ConcurrentReads>
  <ConcurrentWrites>128</ConcurrentWrites>*

The thread stack size on this server is 1MB. So I observe hundreds of single
mmap segment as 1MB.

 Can you also post the full commandline as well?
>
Sure. All of them are default settings.

/usr/bin/java -ea -Xms1G -Xmx1G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8080
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.8.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
org.apache.cassandra.thrift.CassandraDaemon


>  Also, output of cat /proc/meminfo
>

This is an openvz based testing environment. So /proc/meminfo is not very
helpful. Whatever, I paste it here.


MemTotal:      9838380 kB
MemFree:       4005900 kB
Buffers:             0 kB
Cached:              0 kB
SwapCached:          0 kB
Active:              0 kB
Inactive:            0 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      9838380 kB
LowFree:       4005900 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:           0 kB
Mapped:              0 kB
Slab:                0 kB
PageTables:          0 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:         0 kB
Committed_AS:        0 kB
VmallocTotal:        0 kB
VmallocUsed:         0 kB
VmallocChunk:        0 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB


> thanks,
> Sri
>
> On Fri, Dec 17, 2010 at 7:15 PM, Zhu Han <schumi....@gmail.com> wrote:
>
> > Seems like  the problem there after I upgrade to "OpenJDK Runtime
> > Environment (IcedTea6 1.9.2)". So it is not related to the bug I reported
> > two days ago.
> >
> > Can somebody else share some info with us? What's the java environment
> you
> > used? Is it stable for long-lived cassandra instances?
> >
> > best regards,
> > hanzhu
> >
> >
> > On Thu, Dec 16, 2010 at 9:28 PM, Zhu Han <schumi....@gmail.com> wrote:
> >
> > > I've tried it. But it does not work for me this afternoon.
> > >
> > > Thank you!
> > >
> > > best regards,
> > > hanzhu
> > >
> > >
> > >
> > > On Thu, Dec 16, 2010 at 8:59 PM, Matthew Conway <m...@backupify.com
> > >wrote:
> > >
> > >> Thanks for debugging this, I'm running into the same problem.
> > >> BTW, if you can ssh into your nodes, you can use jconsole over ssh:
> > >> http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html
> > >>
> > >> Matt
> > >>
> > >>
> > >> On Dec 16, 2010, at Thu Dec 16, 2:39 AM, Zhu Han wrote:
> > >>
> > >> > Sorry for spam again. :-)
> > >> >
> > >> > I think I find the root cause. Here is a bug report[1] on memory
> leak
> > of
> > >> > ParNewGC.  It is solved by OpenJDK 1.6.0_20(IcedTea6 1.9.2)[2].
> > >> >
> > >> > So the suggestion is: for who runs cassandra  of Ubuntu 10.04,
> please
> > >> > upgrade OpenJDK to the latest version.
> > >> >
> > >> > [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6824570
> > >> > [2]
> > http://blog.fuseyism.com/index.php/2010/09/10/icedtea6-19-released/
> > >> >
> > >> > best regards,
> > >> > hanzhu
> > >> >
> > >> >
> > >> > On Thu, Dec 16, 2010 at 3:10 PM, Zhu Han <schumi....@gmail.com>
> > wrote:
> > >> >
> > >> >> The test node is behind a firewall. So I took some time to find a
> way
> > >> to
> > >> >> get JMX diagnostic information from it.
> > >> >>
> > >> >> What's interesting is, both the HeapMemoryUsage and
> > NonHeapMemoryUsage
> > >> >> reported by JVM is quite reasonable.  So, it's a myth why the JVM
> > >> process
> > >> >> maps such a big anonymous memory region...
> > >> >>
> > >> >> $ java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar -
> > localhost:8080
> > >> >> java.lang:type=Memory HeapMemoryUsage
> > >> >> 12/16/2010 15:07:45 +0800 org.archive.jmx.Client HeapMemoryUsage:
> > >> >> committed: 1065025536
> > >> >> init: 1073741824
> > >> >> max: 1065025536
> > >> >> used: 18295328
> > >> >>
> > >> >> $java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar -
> > localhost:8080
> > >> >> java.lang:type=Memory NonHeapMemoryUsage
> > >> >> 12/16/2010 15:01:51 +0800 org.archive.jmx.Client
> NonHeapMemoryUsage:
> > >> >> committed: 34308096
> > >> >> init: 24313856
> > >> >> max: 226492416
> > >> >> used: 21475376
> > >> >>
> > >> >> If anybody is interested in it, I can provide more diagnostic
> > >> information
> > >> >> before I restart the instance.
> > >> >>
> > >> >> best regards,
> > >> >> hanzhu
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Thu, Dec 16, 2010 at 1:00 PM, Zhu Han <schumi....@gmail.com>
> > wrote:
> > >> >>
> > >> >>> After investigating it deeper,  I suspect it's native memory leak
> of
> > >> JVM.
> > >> >>> The large anonymous map on lower address space should be the
> native
> > >> heap of
> > >> >>> JVM,  but not java object heap.  Has anybody met it before?
> > >> >>>
> > >> >>> I'll try to upgrade the JVM tonight.
> > >> >>>
> > >> >>> best regards,
> > >> >>> hanzhu
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> On Thu, Dec 16, 2010 at 10:50 AM, Zhu Han <schumi....@gmail.com>
> > >> wrote:
> > >> >>>
> > >> >>>> Hi,
> > >> >>>>
> > >> >>>> I have a test node with apache-cassandra-0.6.8 on ubuntu 10.4.
>  The
> > >> >>>> hardware environment is an OpenVZ container. JVM settings is
> > >> >>>> # java -Xmx128m -version
> > >> >>>> java version "1.6.0_18"
> > >> >>>> OpenJDK Runtime Environment (IcedTea6 1.8.2)
> (6b18-1.8.2-4ubuntu2)
> > >> >>>> OpenJDK 64-Bit Server VM (build 16.0-b13, mixed mode)
> > >> >>>>
> > >> >>>> This is the memory settings:
> > >> >>>>
> > >> >>>> "/usr/bin/java -ea -Xms1G -Xmx1G ..."
> > >> >>>>
> > >> >>>> And the ondisk footprint of sstables is very small:
> > >> >>>>
> > >> >>>> "#du -sh data/
> > >> >>>> "9.8M    data/"
> > >> >>>>
> > >> >>>> The node was infrequently accessed in the last  three weeks.
>  After
> > >> that,
> > >> >>>> I observe the abnormal memory utilization by top:
> > >> >>>>
> > >> >>>>  PID USER      PR  NI  *VIRT*  *RES*  SHR S %CPU %MEM    TIME+
> > >> >>>> COMMAND
> > >> >>>>
> > >> >>>> 7836 root      15   0     *3300m* *2.4g*  13m S    0 26.0
> 2:58.51
> > >> >>>> java
> > >> >>>>
> > >> >>>> The jvm heap utilization is quite normal:
> > >> >>>>
> > >> >>>> #sudo jstat -gc -J"-Xmx128m" 7836
> > >> >>>> S0C    S1C    S0U    S1U      *EC*       *EU*          *OC*
> > >> >>>> *OU*            *PC           PU*          YGC  YGCT  FGC    FGCT
> > >> >>>> GCT
> > >> >>>> 8512.0 8512.0 372.8   0.0   *68160.0*   *5225.7*   *963392.0
> > >> 508200.7
> > >> >>>> 30604.0 18373.4*    480    3.979      2      0.005    3.984
> > >> >>>>
> > >> >>>> And then I try "pmap" to see the native memory mapping. *There is
> > two
> > >> >>>> large anonymous mmap regions.*
> > >> >>>>
> > >> >>>> 00000000080dc000 1573568K rw---    [ anon ]
> > >> >>>> 00002b2afc900000  1079180K rw---    [ anon ]
> > >> >>>>
> > >> >>>> The second one should be JVM heap.  What is the first one?  Mmap
> of
> > >> >>>> sstable should never be anonymous mmap, but file based mmap.  *Is
> > it
> > >>  a
> > >> >>>> native memory leak?  *Does cassandra allocate any
> DirectByteBuffer?
> > >> >>>>
> > >> >>>> best regards,
> > >> >>>> hanzhu
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>
> > >>
> > >>
> > >
> >
>

Reply via email to