That sounds like a good idea.   If you don't mind, please file an
issue and make a patch.
Thank you,
St.Ack

On Sun, Apr 24, 2011 at 5:53 PM, bijieshan <[email protected]> wrote:
> Under the current hdfs Version, there's no related method to judge whether 
> the namenode is in safemode.
> Maybe we can handle the SafeModeException at the top layer where called the 
> method of checkFileSystem(),  wait for a while and retry the operation.
> Does that reasonable? I hope someone can give some advises.
>
> Thanks!
> Jieshan Bean
>
> -----邮件原件-----
> 发件人: [email protected] [mailto:[email protected]] 代表 Stack
> 发送时间: 2011年4月24日 5:55
> 收件人: [email protected]
> 抄送: Chenjian
> 主题: Re: Splitlog() executed while the namenode was in safemode may cause 
> data-loss
>
> Sorry, what did you change?
> Thanks,
> St.Ack
>
> On Fri, Apr 22, 2011 at 9:00 PM, bijieshan <[email protected]> wrote:
>> Hi,
>> I found this problem while the namenode went into safemode due to some 
>> unclear reasons.
>> There's one patch about this problem:
>>
>>   try {
>>      HLogSplitter splitter = HLogSplitter.createLogSplitter(
>>        conf, rootdir, logDir, oldLogDir, this.fs);
>>      try {
>>        splitter.splitLog();
>>      } catch (OrphanHLogAfterSplitException e) {
>>        LOG.warn("Retrying splitting because of:", e);
>>        // An HLogSplitter instance can only be used once.  Get new instance.
>>        splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
>>          oldLogDir, this.fs);
>>        splitter.splitLog();
>>      }
>>      splitTime = splitter.getTime();
>>      splitLogSize = splitter.getSize();
>>    } catch (IOException e) {
>>      checkFileSystem();
>>      LOG.error("Failed splitting " + logDir.toString(), e);
>>      master.abort("Shutting down HBase cluster: Failed splitting hlog 
>> files...", e);
>>    } finally {
>>      this.splitLogLock.unlock();
>>    }
>>
>> And it was really give some useful help to some extent, while the namenode 
>> process exited or been killed, but not considered the Namenode safemode 
>> exception.
>>   I think the root reason is the method of checkFileSystem().
>>   It gives out an method to check whether the HDFS works normally(Read and 
>> write could be success), and that maybe the original propose of this method. 
>> This's how this method implements:
>>
>>    DistributedFileSystem dfs = (DistributedFileSystem) fs;
>>    try {
>>      if (dfs.exists(new Path("/"))) {
>>        return;
>>      }
>>    } catch (IOException e) {
>>      exception = RemoteExceptionHandler.checkIOException(e);
>>    }
>>
>>   I have check the hdfs code, and learned that while the namenode was in 
>> safemode ,the dfs.exists(new Path("/")) returned true. Because the file 
>> system could provide read-only service. So this method just checks the dfs 
>> whether could be read. I
>> think it's not reasonable.
>>
>>
>> Regards,
>> Jieshan Bean
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to