Re: master gets stuck at log replay

Jean-Daniel Cryans Thu, 21 Oct 2010 11:03:26 -0700

You should take a look at that server's region server process to see
its health, it should recover (use jps to find if the process is still
running, maybe tail the log to see what's going on, worst case you can
kill -9). For how long was the master stuck? I remember there was an
issue with for some time with 0.89, HBASE-2975, can you verify that
the one you're running has it? Check the CHANGES file. The version we
have currently have on github has it.


I agree the master should be able to ride of that, but the issue is at
the HDFS level. If I remember correctly, the append implementation in
hadoop 0.21 doesn't have that problem but HBase doesn't support at the
mo. Also 0.21 is unstable (it didn't go through Y!'s QA as much as the
other releases do).

The other option HBase has is what you did, either removing or just
ignoring the file. In both cases, you do lose data.

J-D

On Thu, Oct 21, 2010 at 10:48 AM, Jack Levin <magn...@gmail.com> wrote:
> How do we best resolve something like that, I just deleted that
> file... does it mean I might have lost inserts?
>
> -Jack
>
> On Thu, Oct 21, 2010 at 10:32 AM, Jean-Daniel Cryans
> <jdcry...@apache.org> wrote:
>> This can happen when the original owner of the file is still alive, in
>> your case is the region server it's recovering (10.103.5.6) is still
>> running? If it GCed hard, then it probably stayed "alive" for a while
>> but it should shut down when it wakes up.
>>
>> J-D
>>
>> On Thu, Oct 21, 2010 at 10:10 AM, Jack Levin <magn...@gmail.com> wrote:
>>> 2010-10-21 10:08:14,268 WARN org.apache.hadoop.hbase.util.FSUtils:
>>> Waited 2014334ms for lease recovery on
>>> hdfs://namenode-rd.imageshack.us:9000/hbase/.logs/mtae6.prod.imageshack.com,60020,1287624295377/10.103.5.6%3A60020.1287672366636:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:
>>> failed to create file
>>> /hbase/.logs/mtae6.prod.imageshack.com,60020,1287624295377/10.103.5.6%3A60020.1287672366636
>>> for DFSClient_hb_m_10.101.7.1:60000_1287616820725 on client
>>> 10.101.7.1, because this file is already being created by NN_Recovery
>>> on 10.103.5.6
>>>
>>>
>>> Seems to be like a new problem we discovered - any ideas what this means?
>>>
>>> -Jack
>>>
>>
>

Re: master gets stuck at log replay

Reply via email to