I was talking about the server which was anonymized: ***/***:60020 Cheers
On Fri, Dec 28, 2012 at 10:41 AM, Baugher,Bryan <bryan.baug...@cerner.com>wrote: > > > On 12/28/12 12:14 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: > > >Looks like there was socket timeout : > > > >java.net.SocketTimeoutException: 60000 millis timeout while waiting for > >channel to be ready for read. ch : > >java.nio.channels.SocketChannel[connected local=/***:39752 > >remote=***/***:60020] > > > >Have you collected / checked GC log on the server referenced above ? > > I am not sure exactly which server you are referring to. For the > application server we don't currently collect gc logs. For hbase we do but > the gc logs were truncated recently and won't help. > > > > >BTW Have you considered deploying 0.92.2 in your cluster ? > > Not really. We have stuck with cloudera's distribution for a couple years > now and I don't really see us going down that track. > > > > >Thanks, glad to see Cerner using HBase. > > > >On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan > ><bryan.baug...@cerner.com>wrote: > > > >> Hi everyone, > >> > >> For the past month or so we have noticed that some of our applications > >> become frozen about once a day and need to be restarted in order to > >>bring > >> them back. We eventually figured out that it was caused by/happening > >>during > >> major compactions. > >> > >> We have automated major compactions disabled and are running them > >>manually > >> on each table sequentially each day starting at 4am. We are running on > >> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough this is > >> only happening in our dev environment with each region server serving > >>~650 > >> regions. > >> > >> Looking at the logs in HBase show that the compactions are occurring and > >> this warning repeatedly while the compactions are occurring, > >> > >> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call > >> getHTableDescriptors(), rpc version=1, client version=29, > >> methodsFingerPrint=400804878 from ***: output error > >> > >> Looking at our application logs we often see this error or a > >>variation[1]. > >> > >> I took a thread dump of our application while it was locked and saw that > >> nearly all of the threads in the application were blocked by a single > >> thread that was waiting on HBaseClient$Call[2]. > >> > >> [1] - http://pastebin.com/P4skndEg > >> [2] - http://pastebin.com/YLZn3SRK > >> > >> > >> CONFIDENTIALITY NOTICE This message and any included attachments are > >>from > >> Cerner Corporation and are intended only for the addressee. The > >>information > >> contained in this message is confidential and may constitute inside or > >> non-public information under international, federal, or state securities > >> laws. Unauthorized forwarding, printing, copying, distribution, or use > >>of > >> such information is strictly prohibited and may be unlawful. If you are > >>not > >> the addressee, please promptly delete this message and notify the > >>sender of > >> the delivery error by e-mail or you may call Cerner's corporate offices > >>in > >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024. > >> > >