Inline... > Hi Friso and everyone, > > OK. We don't have to spend time to juggle hadoop-core jars anymore since Todd > is working hard on enhancing hadoop-lzo behavior. > > I think your assumption is correct, but what I was trying to say was HBase > doesn't change the way to use Hadoop compressors since HBase 0.20 release > while Hadoop added reinit() on 0.21. I verified that ASF Hadoop 0.21 and > CDH3b3 have reinit() and ASF Hadoop 0.20.2 (including its append branch) and > CDH3b2 don't. I saw you had no problem running HBase 0.89 on CDH3b2, so I > thought HBase 0.90 would work fine on ASF Hadoop 0.20.2. Because both of them > don't have reinit().
Ah, so my mistake was that I thought using the reinit() is something HBase specific, but it just depends on the Hadoop jar that you drop in the lib folder then. It's just that I never saw these problems in mappers and reducers but only in the RS. @Stack, to answer your question once more then: I don't think it's a problem with the way that HBase uses the compressors, but it's a problem with the (LZO) compressor implementation in combination with the usage pattern that you get when using HBase and particular types of workloads. > > HBase tries to create an output compression stream on each compression block, > and one HFile flush will contain roughly 1000 compression blocks. I think > reinit() could get called 1000 times on one flush, and if hadoop-lzo > allocates 64MB block on reinit() (HBase's compression blocks is about 64KB > though), it will become pretty much something you're observing now. > > Thanks, > > -- > Tatsuya Kawano (Mr.) > Tokyo, Japan > > > On Jan 13, 2011, at 7:50 AM, Todd Lipcon <[email protected]> wrote: > >> Can someone who is having this issue try checking out the following git >> branch and rebuilding LZO? >> >> https://github.com/toddlipcon/hadoop-lzo/tree/realloc >> >> This definitely stems one leak of a 64KB directbuffer on every reinit. >> >> -Todd >> >> On Wed, Jan 12, 2011 at 2:12 PM, Todd Lipcon <[email protected]> wrote: >> >>> Yea, you're definitely on the right track. Have you considered systems >>> programming, Friso? :) >>> >>> Hopefully should have a candidate patch to LZO later today. >>> >>> -Todd >>> >>> On Wed, Jan 12, 2011 at 1:20 PM, Friso van Vollenhoven < >>> [email protected]> wrote: >>> >>>> Hi, >>>> My guess is indeed that it has to do with using the reinit() method on >>>> compressors and making them long lived instead of throwaway together with >>>> the LZO implementation of reinit(), which magically causes NIO buffer >>>> objects not to be finalized and as a result not release their native >>>> allocations. It's just theory and I haven't had the time to properly verify >>>> this (unfortunately, I spend most of my time writing application code), but >>>> Todd said he will be looking into it further. I browsed the LZO code to see >>>> what was going on there, but with my limited knowledge of the HBase code it >>>> would be bald to say that this is for sure the case. It would be my first >>>> direction of investigation. I would add some logging to the LZO code where >>>> new direct byte buffers are created to log how often that happens and what >>>> size they are and then redo the workload that shows the leak. Together with >>>> some profiling you should be able to see how long it takes for these get >>>> finalized. >>>> >>>> Cheers, >>>> Friso >>>> >>>> >>>> >>>> On 12 jan 2011, at 20:08, Stack wrote: >>>> >>>>> 2011/1/12 Friso van Vollenhoven <[email protected]>: >>>>>> No, I haven't. But the Hadoop (mapreduce) LZO compression is not the >>>> problem. Compressing the map output using LZO works just fine. The problem >>>> is HBase LZO compression. The region server process is the one with the >>>> memory leak... >>>>>> >>>>> >>>>> (Sorry for dumb question Friso) But HBase is leaking because we make >>>>> use of the Compression API in a manner that produces leaks? >>>>> Thanks, >>>>> St.Ack >>>> >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera
