Hrm, any chance you can run with a smaller heap and get a jmap dump? The eclipse MAT tool is also super nice for looking at this stuff if indeed they are java objects.
What kind of workload are you using? Read mostly? Write mostly? Mixed? I will try to repro. -Todd On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven < fvanvollenho...@xebia.com> wrote: > I figured the same. I also did a run with CMS instead of G1. Same results. > > I also did a run with the RS heap tuned down to 12GB and 8GB, but given > enough time the process still grows over 40GB in size. > > > Friso > > > > On 12 nov 2010, at 01:55, Todd Lipcon wrote: > > > Can you try running this with CMS GC instead of G1GC? G1 still has some > > bugs... 64M sounds like it might be G1 "regions"? > > > > -Todd > > > > On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven < > > fvanvollenho...@xebia.com> wrote: > > > >> Hi All, > >> > >> (This is all about CDH3, so I am not sure whether it should go on this > >> list, but I figure it is at least interesting for people trying the > same.) > >> > >> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo > >> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works > like > >> a charm initially, but after some time (minutes to max one hour), the RS > JVM > >> process memory grows to more than twice the given heap size and beyond. > I > >> have seen a RS with 16GB heap that grows to 55GB virtual size. At some > >> point, everything start swapping and GC times go into the minutes and > >> everything dies or is considered dead by the master. > >> > >> I did a pmap -x on the RS process and that shows a lot of allocated > blocks > >> of about 64M by the process. There about 500 of these, which is 32GB in > >> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the > blocks > >> of about 1M on top are probably thread stacks). Unfortunately, Linux > shows > >> the native heap as anon blocks, so I can not link it to a specific lib > or > >> something. > >> > >> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the > one > >> which has the reinit() support). I run Java 6u21 with the G1 garbage > >> collector, which has been running fine for some weeks now. Full command > line > >> is: > >> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError > >> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops > >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > >> -Xloggc:/export/logs/hbase/gc-hbase.log > >> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64 > >> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase > >> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log > >> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r > >> > >> I searched the HBase source for something that could point to native > heap > >> usage (like ByteBuffer#allocateDirect(...)), but I could not find > anything. > >> Thread count is about 185 (I have 100 handlers), so nothing strange > there as > >> well. > >> > >> Question is, could this be HBase or is this a problem with the > hadoop-lzo? > >> > >> I have currently downgraded to a version known to work, because we have > a > >> demo coming up. But still interested in the answer. > >> > >> > >> > >> Regards, > >> Friso > >> > >> > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera