This can be considered as a bug. We're not going to survive long if there is an OOM, so retrying is not very useful... You can create a jira for this (https://issues.apache.org/jira/browse/HBASE), and submit a patch if possible..
For the root cause, if you don't have much memory, you can lower the buffer size (hbase.client.write.buffer) or the number of simultaneous tasks ( hbase.client.max.total.tasks) On Fri, May 23, 2014 at 5:44 AM, Ted Yu <yuzhih...@gmail.com> wrote: > w.r.t. your questions toward the bottom of your email: > CDH HBase is one distribution of Apache HBase. As you can see from the url, > it is based on 0.96.1.1 > HDP is another distribution. > > There're many projects under Apache Software Foundation, such as YARN, > HDFS, HBase, Phoenix. > You can get more information on respective project through their JIRAs, > mailing lists, source repository, etc. > > You can get 0.96 source code using the following command: > git clone -b 0.96 https://git-wip-us.apache.org/repos/asf/hbase.git <local > directory name> > > You can also get more help from CDH mailing list. > > Cheers > > > On Thu, May 22, 2014 at 7:38 PM, jingych <jing...@neusoft.com> wrote: > > > Hi, JM & everyone! > > > > Thanks for your reply! > > > > My java client, use the HTable#put(Put) method with 2M buffer to commit > > rows. > > While the client running for a while, the jvisualvm shows the htable > > threads increasing and the heap size increased too. > > Here is the snapshot: > > > > i research the implementation of the HTable#put(Put). > > I found the doPut operation is the async, and when one row success, this > > will return. > > > > So, I guess when the HBase server is slower to response the put request. > > The hbase client will accumulate so many rows. > > And on the other hand, when the put operation failed, the hbase client > > will retry with the recursion <http://dict.youdao.com/w/recursion/> > submit > > method. > > > > I think the above two is the reason caused the OOME. > > > > > > PS: > > What do you mean It's not cdh5 specific? I found it from the website > > http://archive.cloudera.com/cdh5/cdh/5/hbase-0.96.1.1-cdh5.0.0/ > > Where can i get the CDH5 source code ? > > CDH5 means hbase 0.96.2? > > I'm confused with the CDH5, Apache, Phoenix ... > > What's the relationship among them. > > > > Thanks a lot! > > > > Best regards! > > > > ------------------------------ > > > > Jingych > > > > *From:* Jean-Marc Spaggiari <jean-m...@spaggiari.org> > > *Date:* 2014-05-23 09:33 > > *To:* user <user@hbase.apache.org>; jingych <jing...@neusoft.com> > > *Subject:* Re: CDH5 hbase client outofmemory > > Hi Jingych, > > > > This is the HBase 0.96.2 code. Not CDH5 specific. Do you have more > details > > on you OOME? Did you figured why it occured? > > > > JM > > > > > > 2014-05-22 1:20 GMT-04:00 jingych <jing...@neusoft.com>: > > > > Hello, everyone! > >> > >> I found the CDH5 hbase client swallowed the Outofmemory exception. > >> > >> It didn't throw out, cause the program couldn't make the correct > response > >> to process the OOM. > >> > >> Is't good to catch the client OOM? Why? > >> > >> The CDH5 hbase client process the throwable code: > >> private boolean manageError(int originalIndex, Row row, boolean > canRetry, > >> Throwable throwable, HRegionLocation > >> location) { > >> if (canRetry && throwable != null && throwable instanceof > >> DoNotRetryIOException) { > >> canRetry = false; > >> } > >> > >> byte[] region = null; > >> if (canRetry && callback != null) { > >> region = location == null ? null : > >> location.getRegionInfo().getEncodedNameAsBytes(); > >> canRetry = callback.retriableFailure(originalIndex, row, region, > >> throwable); > >> } > >> > >> if (!canRetry) { > >> if (callback != null) { > >> if (region == null && location != null) { > >> region = location.getRegionInfo().getEncodedNameAsBytes(); > >> } > >> callback.failure(originalIndex, region, row, throwable); > >> } > >> errors.add(throwable, row, location); > >> this.hasError.set(true); > >> } > >> > >> return canRetry; > >> } > >> > >> This will treat the OOM as the canRetry exception. > >> > >> Thanks! > >> > >> Best Regards! > >> > >> > >> > >> > >> jingych > >> > >> > --------------------------------------------------------------------------------------------------- > >> Confidentiality Notice: The information contained in this e-mail and any > >> accompanying attachment(s) > >> is intended only for the use of the intended recipient and may be > >> confidential and/or privileged of > >> Neusoft Corporation, its subsidiaries and/or its affiliates. If any > >> reader of this communication is > >> not the intended recipient, unauthorized use, forwarding, printing, > >> storing, disclosure or copying > >> is strictly prohibited, and may be unlawful.If you have received this > >> communication in error,please > >> immediately notify the sender by return e-mail, and delete the original > >> message and all copies from > >> your system. Thank you. > >> > >> > --------------------------------------------------------------------------------------------------- > >> > > > > > > > --------------------------------------------------------------------------------------------------- > > Confidentiality Notice: The information contained in this e-mail and any > > accompanying attachment(s) > > is intended only for the use of the intended recipient and may be > > confidential and/or privileged of > > Neusoft Corporation, its subsidiaries and/or its affiliates. If any > reader > > of this communication is > > not the intended recipient, unauthorized use, forwarding, printing, > > storing, disclosure or copying > > is strictly prohibited, and may be unlawful.If you have received this > > communication in error,please > > immediately notify the sender by return e-mail, and delete the original > > message and all copies from > > your system. Thank you. > > > > > --------------------------------------------------------------------------------------------------- > > >