Hey Friso,

Thanks so much for the details. I am starting to imagine it could indeed be
a codec leak - especially since you have some cells which are into the MB,
maybe it's expanding some buffers to 64MB.

Let me try to do some tests to reproduce it here in the next week or so.

Anyone else seen this issue?

Thanks
-Todd

On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
fvanvollenho...@xebia.com> wrote:

> Hi Todd,
>
> I am afraid I no longer have the broken setup around, because we really
> need a working one right now. We need to demo at a conference next week and
> until after that, all changes are frozen both on dev and prod (so we can use
> dev as fall back). Later on I could maybe try some more things on our dev
> boxes.
>
> If you are doing a repro, here's the stuff you'd probably want to know:
> The workload is write only. No reads happening at the same time. No other
> active clients. It is an initial import of data. We do insertions in a MR
> job from the reducers. The total volume is about 11 billion puts across
> roughly 450K rows per table (we have a many columns per row data model)
> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values range
> from a small number of KBs generally to MBs in rare cases. The row keys have
> a time-related part at the start, so I know the keyspace in advance, so I
> create the empty tables with pre-created regions (40 regions) across the
> keyspace to get decent distribution from the start of the job. In order to
> not overload HBase, I run the job with only 15 reducers, so there are max 15
> concurrent clients active. Other settnigs: max file size is 1GB, HFile block
> size is default 64K, client side buffer is 16M, memstore flush size is 128M,
> compaction threshold is 5, blocking store files is 9, mem store upper limit
> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
> report more than 5GB of heap usage from the UI, which makes sense, because
> block cache is not touched. On a healthy run with somewhat conservative
> settings right now, HBase reports on average about 380K requests per second
> in the master UI.
>
> The cluster has 8 workers running TT, DN, RS and another JVM process for
> our own software that sits in front of HBase. Workers are dual quad cores
> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of seeks
> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get 1GB
> of heap, TT and DN also. RS gets 16GB of heap and our own software too. We
> run 8 mappers and 4 reducers per node. So at absolute max, we should have
> 46GB of allocated heap. That leaves 18GB for JVM overhead, native
> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is CentOS,
> but I didn't do the installs myself.
>
> I tried numerous different settings both more extreme and more conservative
> to get the thing working, but in the end it always ends up swapping. I
> should have tried a run without LZO, of course, but I was out of time by
> then.
>
>
>
> Cheers,
> Friso
>
>
>
> On 12 nov 2010, at 07:06, Todd Lipcon wrote:
>
> > Hrm, any chance you can run with a smaller heap and get a jmap dump? The
> > eclipse MAT tool is also super nice for looking at this stuff if indeed
> they
> > are java objects.
> >
> > What kind of workload are you using? Read mostly? Write mostly? Mixed? I
> > will try to repro.
> >
> > -Todd
> >
> > On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
> > fvanvollenho...@xebia.com> wrote:
> >
> >> I figured the same. I also did a run with CMS instead of G1. Same
> results.
> >>
> >> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
> >> enough time the process still grows over 40GB in size.
> >>
> >>
> >> Friso
> >>
> >>
> >>
> >> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
> >>
> >>> Can you try running this with CMS GC instead of G1GC? G1 still has some
> >>> bugs... 64M sounds like it might be G1 "regions"?
> >>>
> >>> -Todd
> >>>
> >>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> >>> fvanvollenho...@xebia.com> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> (This is all about CDH3, so I am not sure whether it should go on this
> >>>> list, but I figure it is at least interesting for people trying the
> >> same.)
> >>>>
> >>>> I've recently tried CDH3 on a new cluster from RPMs with the
> hadoop-lzo
> >>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
> >> like
> >>>> a charm initially, but after some time (minutes to max one hour), the
> RS
> >> JVM
> >>>> process memory grows to more than twice the given heap size and
> beyond.
> >> I
> >>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
> >>>> point, everything start swapping and GC times go into the minutes and
> >>>> everything dies or is considered dead by the master.
> >>>>
> >>>> I did a pmap -x on the RS process and that shows a lot of allocated
> >> blocks
> >>>> of about 64M by the process. There about 500 of these, which is 32GB
> in
> >>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> >> blocks
> >>>> of about 1M on top are probably thread stacks). Unfortunately, Linux
> >> shows
> >>>> the native heap as anon blocks, so I can not link it to a specific lib
> >> or
> >>>> something.
> >>>>
> >>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
> >> one
> >>>> which has the reinit() support). I run Java 6u21 with the G1 garbage
> >>>> collector, which has been running fine for some weeks now. Full
> command
> >> line
> >>>> is:
> >>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
> >>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >>>> -Xloggc:/export/logs/hbase/gc-hbase.log
> >>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> >>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> >>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> >>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
> >>>>
> >>>> I searched the HBase source for something that could point to native
> >> heap
> >>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> >> anything.
> >>>> Thread count is about 185 (I have 100 handlers), so nothing strange
> >> there as
> >>>> well.
> >>>>
> >>>> Question is, could this be HBase or is this a problem with the
> >> hadoop-lzo?
> >>>>
> >>>> I have currently downgraded to a version known to work, because we
> have
> >> a
> >>>> demo coming up. But still interested in the answer.
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>> Friso
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to