Hello,

This is the first time I am sending a query on hbase mailing list. Hopefully 
this is the correct group to ask hbase/hadoop related questions.

I am running hbase 0.92, hadoop 2.0 (cdh 4.1.3). Recently, there were some 
instability in my dns service and host lookups request failed occasionally. 
During such failures, some random region server will shut itself down when it 
encounters a fatal exception during log roll operation. DNS issue was 
eventually resolved and region server fatals stopped.

While I was trying to understand the hbase/hadoop behavior during network 
events/blips, I found there is a default retry policy used - 
TRY_ONCE_THEN_FAIL. Please correct me if thats not the case.

But then I was thinking that there could be more of these blips during network 
or some other infrastructure maintenance operations. These maintenance 
operations should not result in region server going down. If the client simply 
attempts one more time, host lookup request should succeed. 

If someone has any similar experience, can they please share? Are there options 
one can try out against such failures? 

May be I am not thinking in the right direction, but this behavior makes me 
feel that hbase (using hdfs) is sensitive to DNS service availability. DNS 
unavailability for even few seconds can bring down the entire cluster (rare 
chance if all attempt to roll hlogs at the same time).

Here is the stack trace:
11:14:48.706 AM 
2014-08-17 11:14:48,706 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
hadoop0104111601,60020,1408273008941: IOE in log roller
java.io.IOException: cannot get log writer
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)
        at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
        at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)
        ... 4 more
Caused by: java.lang.RuntimeException: 
java.lang.reflect.InvocationTargetException
        at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:122)
        at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:148)
        at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:233)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:321)
        at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:319)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
        at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:319)
        at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:432)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)
        at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
        ... 5 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedConstructorAccessor20.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:120)
        ... 19 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: 
hadoop0104111601
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
        at 
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy(ConfiguredFailoverProxyProvider.java:125)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:60)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:51)
        at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
        at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:137)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:389)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:356)
        at org.apache.hadoop.fs.Hdfs.<init>(Hdfs.java:84)
        ... 23 more
Caused by: java.net.UnknownHostException: hadoop0104111601
        ... 33 more

thanks, 
Arun

Reply via email to