I believe that is one of our region servers which I will have to wait till
tomorrow to check gc logs.

On 12/28/12 12:45 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:

>I was talking about the server which was anonymized:
>***/***:60020
>
>Cheers
>
>On Fri, Dec 28, 2012 at 10:41 AM, Baugher,Bryan
><bryan.baug...@cerner.com>wrote:
>
>>
>>
>> On 12/28/12 12:14 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
>>
>> >Looks like there was socket timeout :
>> >
>> >java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> >channel to be ready for read. ch :
>> >java.nio.channels.SocketChannel[connected local=/***:39752
>> >remote=***/***:60020]
>> >
>> >Have you collected / checked GC log on the server referenced above ?
>>
>> I am not sure exactly which server you are referring to. For the
>> application server we don't currently collect gc logs. For hbase we do
>>but
>> the gc logs were truncated recently and won't help.
>>
>> >
>> >BTW Have you considered deploying 0.92.2 in your cluster ?
>>
>> Not really. We have stuck with cloudera's distribution for a couple
>>years
>> now and I don't really see us going down that track.
>>
>> >
>> >Thanks, glad to see Cerner using HBase.
>> >
>> >On Fri, Dec 28, 2012 at 9:40 AM, Baugher,Bryan
>> ><bryan.baug...@cerner.com>wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> For the past month or so we have noticed that some of our
>>applications
>> >> become frozen about once a day and need to be restarted in order to
>> >>bring
>> >> them back. We eventually figured out that it was caused by/happening
>> >>during
>> >> major compactions.
>> >>
>> >> We have automated major compactions disabled and are running them
>> >>manually
>> >> on each table sequentially each day starting at 4am. We are running
>>on
>> >> CDH4.1.1 (Hbase Version : 0.92.1-cdh4.1.1). Interestingly enough
>>this is
>> >> only happening in our dev environment with each region server serving
>> >>~650
>> >> regions.
>> >>
>> >> Looking at the logs in HBase show that the compactions are occurring
>>and
>> >> this warning repeatedly while the compactions are occurring,
>> >>
>> >> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call
>> >> getHTableDescriptors(), rpc version=1, client version=29,
>> >> methodsFingerPrint=400804878 from ***: output error
>> >>
>> >> Looking at our application logs we often see this error or a
>> >>variation[1].
>> >>
>> >> I took a thread dump of our application while it was locked and saw
>>that
>> >> nearly all of the threads in the application were blocked by a single
>> >> thread that was waiting on HBaseClient$Call[2].
>> >>
>> >> [1] - http://pastebin.com/P4skndEg
>> >> [2] - http://pastebin.com/YLZn3SRK
>> >>
>> >>
>> >> CONFIDENTIALITY NOTICE This message and any included attachments are
>> >>from
>> >> Cerner Corporation and are intended only for the addressee. The
>> >>information
>> >> contained in this message is confidential and may constitute inside
>>or
>> >> non-public information under international, federal, or state
>>securities
>> >> laws. Unauthorized forwarding, printing, copying, distribution, or
>>use
>> >>of
>> >> such information is strictly prohibited and may be unlawful. If you
>>are
>> >>not
>> >> the addressee, please promptly delete this message and notify the
>> >>sender of
>> >> the delivery error by e-mail or you may call Cerner's corporate
>>offices
>> >>in
>> >> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>> >>
>>
>>

Reply via email to