I believe that is one of our region servers which I will have to wait till tomorrow to check gc logs.
On 12/28/12 12:45 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >I was talking about the server which was anonymized: >***/***:60020 > >Cheers > >On Fri, Dec 28, 2012 at 10:41 AM, Baugher,Bryan ><bryan.baug...@cerner.com>wrote: > >> >> >> On 12/28/12 12:14 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >> >> >Looks like there was socket timeout : >> > >> >java.net.SocketTimeoutException: 60000 millis timeout while waiting for >> >channel to be ready for read. ch : >> >java.nio.channels.SocketChannel[connected local=/***:39752 >> >remote=***/***:60020] >> > >> >Have you collected / checked GC log on the server referenced above ? >> >> I am not sure exactly which server you are referring to. For the >> application server we don't currently collect gc logs. For hbase we do >>but >> the gc logs were truncated recently and won't help. >> >> > >> >BTW Have you considered deploying 0.92.2 in your cluster ? >> >> Not really. We have stuck with cloudera's distribution for a couple >>years >> now and I don't really see us going down that track. >> >> > >> >Thanks, glad to see Cerner using HBase. >> > >> >On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan >> ><bryan.baug...@cerner.com>wrote: >> > >> >> Hi everyone, >> >> >> >> For the past month or so we have noticed that some of our >>applications >> >> become frozen about once a day and need to be restarted in order to >> >>bring >> >> them back. We eventually figured out that it was caused by/happening >> >>during >> >> major compactions. >> >> >> >> We have automated major compactions disabled and are running them >> >>manually >> >> on each table sequentially each day starting at 4am. We are running >>on >> >> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough >>this is >> >> only happening in our dev environment with each region server serving >> >>~650 >> >> regions. >> >> >> >> Looking at the logs in HBase show that the compactions are occurring >>and >> >> this warning repeatedly while the compactions are occurring, >> >> >> >> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call >> >> getHTableDescriptors(), rpc version=1, client version=29, >> >> methodsFingerPrint=400804878 from ***: output error >> >> >> >> Looking at our application logs we often see this error or a >> >>variation[1]. >> >> >> >> I took a thread dump of our application while it was locked and saw >>that >> >> nearly all of the threads in the application were blocked by a single >> >> thread that was waiting on HBaseClient$Call[2]. >> >> >> >> [1] - http://pastebin.com/P4skndEg >> >> [2] - http://pastebin.com/YLZn3SRK >> >> >> >> >> >> CONFIDENTIALITY NOTICE This message and any included attachments are >> >>from >> >> Cerner Corporation and are intended only for the addressee. The >> >>information >> >> contained in this message is confidential and may constitute inside >>or >> >> non-public information under international, federal, or state >>securities >> >> laws. Unauthorized forwarding, printing, copying, distribution, or >>use >> >>of >> >> such information is strictly prohibited and may be unlawful. If you >>are >> >>not >> >> the addressee, please promptly delete this message and notify the >> >>sender of >> >> the delivery error by e-mail or you may call Cerner's corporate >>offices >> >>in >> >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024. >> >> >> >>