Hi, I added HBASE-4202 for this. Hope stacktraces are enough. Matthias
On 8/11/11, Stack <[email protected]> wrote: > Mind making an issue and pasting full stack traces with some > surrounding log. My guess is we likely will do same in 0.90. Your > snippets will help us figure where to dig in. > > Thanks Matthias, > St.Ack > > On Thu, Aug 11, 2011 at 2:33 AM, Matthias Hofschen <[email protected]> > wrote: >> Hi, >> we had an interesting failure yesterday on the old 0.20.4 version of >> hbase. >> I realize that this is a very old version but am wondering whether this is >> an issue that is still present and should be fixed. >> We added a new node to a 44 node cluster starting the datanode and >> regionserver processes on it. The Unix filesystem was configured >> incorrectly, i.e. /tmp was not writable to hadoop process. Both datanode >> and >> regionserver processes had issues with the permissions. >> >> The datanode process stopped with an error message: >> >> 2011-08-06 23:37:20,469 WARN org.mortbay.log: tmpdir >> java.io.IOException: Permission denied >> at java.io.UnixFileSystem.createFileExclusively(Native Method) >> at java.io.File.checkAndCreate(File.java:1704) >> at java.io.File.createTempFile(File.java:1792) >> at java.io.File.createTempFile(File.java:1828) >> at >> org.mortbay.jetty.webapp.WebAppContext.getTempDirectory(WebAppContext.java:745) >> .... >> 2011-08-06 23:37:20,471 INFO >> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down DataNode at hdpxxx >> ************************************************************/ >> >> The regionserver did not stop even though the error was logged: >> >> 2011-08-07 00:07:39.742::WARN: tmpdir >> java.io.IOException: Permission denied >> at java.io.UnixFileSystem.createFileExclusively(Native Method) >> at java.io.File.checkAndCreate(File.java:1704) >> at java.io.File.createTempFile(File.java:1792) >> at java.io.File.createTempFile(File.java:1828) >> at >> org.mortbay.jetty.webapp.WebAppContext.getTempDirectory(WebAppContext.java:745) >> ....... >> at org.apache.hadoop.http.HttpServer.start(HttpServer.java:461) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1168) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:792) >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:430) >> >> In fact to the master process the regionserver looked fine, so it was >> trying >> to send regions its way. Regionserver rejected them. So the >> master/balancer >> was going into a assign/reassign cycle destabilizing the cluster. Many >> puts >> and gets simply failed with NotServingRegionExceptions and took a long >> time >> to complete. >> >> Please advise whether this may be a problem in 0.90x code. >> >> Cheers Matthias >> >
