there have errors in your dats node log, and the error time match with rs log error time.
--Send from my Sony mobile. On Jun 5, 2013 5:06 PM, "Vimal Jain" <vkj...@gmail.com> wrote: > I don't think so , as i dont find any issues in data node logs. > Also there are lot of exceptions like "session expired" , "slept more than > configured time" . what are these ? > > > On Wed, Jun 5, 2013 at 2:27 PM, Azuryy Yu <azury...@gmail.com> wrote: > > > Because your data node 192.168.20.30 broke down. which leads to RS down. > > > > > > On Wed, Jun 5, 2013 at 3:19 PM, Vimal Jain <vkj...@gmail.com> wrote: > > > > > Here is the complete log: > > > > > > http://bin.cakephp.org/saved/103001 - Hregion > > > http://bin.cakephp.org/saved/103000 - Hmaster > > > http://bin.cakephp.org/saved/103002 - Datanode > > > > > > > > > On Wed, Jun 5, 2013 at 11:58 AM, Vimal Jain <vkj...@gmail.com> wrote: > > > > > > > Hi, > > > > I have set up Hbase in pseudo-distributed mode. > > > > It was working fine for 6 days , but suddenly today morning both > > HMaster > > > > and Hregion process went down. > > > > I checked in logs of both hadoop and hbase. > > > > Please help here. > > > > Here are the snippets :- > > > > > > > > *Datanode logs:* > > > > 2013-06-05 05:12:51,436 INFO > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in > > > receiveBlock > > > > for block blk_1597245478875608321_2818 java.io.EOFException: while > > trying > > > > to read 2347 bytes > > > > 2013-06-05 05:12:51,442 INFO > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock > > > > blk_1597245478875608321_2818 received exception java.io.EOFException: > > > while > > > > trying to read 2347 bytes > > > > 2013-06-05 05:12:51,442 ERROR > > > > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration( > > > > 192.168.20.30:50010, > > > > storageID=DS-1816106352-192.168.20.30-50010-1369314076237, > > > infoPort=50075, > > > > ipcPort=50020):DataXceiver > > > > java.io.EOFException: while trying to read 2347 bytes > > > > > > > > > > > > *HRegion logs:* > > > > 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4694929ms instead of 3000ms, this is likely due to a long > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:51,045 WARN org.apache.hadoop.hdfs.DFSClient: > > > > DFSOutputStream ResponseProcessor exception for block > > > > blk_1597245478875608321_2818java.net.SocketTimeoutException: 63000 > > millis > > > > timeout while waiting for channel to be ready for read. ch : > > > > java.nio.channels.SocketChannel[connected local=/192.168.20.30:44333 > > > remote=/ > > > > 192.168.20.30:50010] > > > > 2013-06-05 05:12:51,046 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 11695345ms instead of 10000000ms, this is likely due to a long > > > > garbage collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:51,048 WARN org.apache.hadoop.hdfs.DFSClient: Error > > > > Recovery for block blk_1597245478875608321_2818 bad datanode[0] > > > > 192.168.20.30:50010 > > > > 2013-06-05 05:12:51,075 WARN org.apache.hadoop.hdfs.DFSClient: Error > > > while > > > > syncing > > > > java.io.IOException: All datanodes 192.168.20.30:50010 are bad. > > > > Aborting... > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096) > > > > 2013-06-05 05:12:51,110 FATAL > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. > > Requesting > > > > close of hlog > > > > java.io.IOException: Reflection > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > Caused by: java.io.IOException: DFSOutputStream is closed > > > > 2013-06-05 05:12:51,180 FATAL > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not sync. > > Requesting > > > > close of hlog > > > > java.io.IOException: Reflection > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > Caused by: java.io.IOException: DFSOutputStream is closed > > > > 2013-06-05 05:12:51,183 ERROR > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Failed close of HLog > > > writer > > > > java.io.IOException: Reflection > > > > Caused by: java.lang.reflect.InvocationTargetException > > > > Caused by: java.io.IOException: DFSOutputStream is closed > > > > 2013-06-05 05:12:51,184 WARN > > > > org.apache.hadoop.hbase.regionserver.wal.HLog: Riding over HLog close > > > > failure! error count=1 > > > > 2013-06-05 05:12:52,557 FATAL > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region > > > server > > > > hbase.rummycircle.com,60020,1369877672964: > > > > regionserver:60020-0x13ef31264d00001 > > regionserver:60020-0x13ef31264d00001 > > > > received expired from ZooKeeper, aborting > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired > > > > 2013-06-05 05:12:52,557 FATAL > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer > abort: > > > > loaded coprocessors are: [] > > > > 2013-06-05 05:12:52,621 INFO > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: SplitLogWorker > > > > interrupted while waiting for task, exiting: > > > java.lang.InterruptedException > > > > java.io.InterruptedIOException: Aborting compaction of store cfp_info > > in > > > > region > > event_data,244630,1369879570539.3ebddcd11a3c22585a690bf40911cb1e. > > > > because user requested stop. > > > > 2013-06-05 05:12:53,425 WARN > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > > > transient > > > > ZooKeeper exception: > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired for /hbase/rs/ > hbase.rummycircle.com > > > > ,60020,1369877672964 > > > > 2013-06-05 05:12:55,426 WARN > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > > > transient > > > > ZooKeeper exception: > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired for /hbase/rs/ > hbase.rummycircle.com > > > > ,60020,1369877672964 > > > > 2013-06-05 05:12:59,427 WARN > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > > > transient > > > > ZooKeeper exception: > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired for /hbase/rs/ > hbase.rummycircle.com > > > > ,60020,1369877672964 > > > > 2013-06-05 05:13:07,427 WARN > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly > > > transient > > > > ZooKeeper exception: > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired for /hbase/rs/ > hbase.rummycircle.com > > > > ,60020,1369877672964 > > > > 2013-06-05 05:13:07,427 ERROR > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper > > delete > > > > failed after 3 retries > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired for /hbase/rs/ > hbase.rummycircle.com > > > > ,60020,1369877672964 > > > > at > > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > > > > at > > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > > > 2013-06-05 05:13:07,436 ERROR org.apache.hadoop.hdfs.DFSClient: > > Exception > > > > closing file /hbase/.logs/hbase.rummycircle.com,60020,1369877672964/ > > > > hbase.rummycircle.com%2C60020%2C1369877672964.1370382721642 : > > > > java.io.IOException: All datanodes 192.168.20.30:50010 are bad. > > > > Aborting... > > > > java.io.IOException: All datanodes 192.168.20.30:50010 are bad. > > > > Aborting... > > > > at > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096) > > > > > > > > > > > > *HMaster logs:* > > > > 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4702394ms instead of 10000ms, this is likely due to a long > > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4988731ms instead of 300000ms, this is likely due to a long > > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4988726ms instead of 300000ms, this is likely due to a long > > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,701 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4698291ms instead of 10000ms, this is likely due to a long > > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,711 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4694502ms instead of 1000ms, this is likely due to a long > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,714 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4694492ms instead of 1000ms, this is likely due to a long > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:50,715 WARN org.apache.hadoop.hbase.util.Sleeper: We > > > > slept 4695589ms instead of 60000ms, this is likely due to a long > > garbage > > > > collecting pause and it's usually bad, see > > > > http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired > > > > 2013-06-05 05:12:52,263 FATAL org.apache.hadoop.hbase.master.HMaster: > > > > Master server abort: loaded coprocessors are: [] > > > > 2013-06-05 05:12:52,465 INFO > > > org.apache.hadoop.hbase.master.ServerManager: > > > > Waiting for region servers count to settle; currently checked in 1, > > slept > > > > for 0 ms, expecting minimum of 1, maximum of 2147483647, timeout of > > 4500 > > > > ms, interval of 1500 ms. > > > > 2013-06-05 05:12:52,561 ERROR org.apache.hadoop.hbase.master.HMaster: > > > > Region server hbase.rummycircle.com,60020,1369877672964 reported a > > fatal > > > > error: > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired > > > > 2013-06-05 05:12:53,970 INFO > > > org.apache.hadoop.hbase.master.ServerManager: > > > > Waiting for region servers count to settle; currently checked in 1, > > slept > > > > for 1506 ms, expecting minimum of 1, maximum of 2147483647, timeout > of > > > 4500 > > > > ms, interval of 1500 ms. > > > > 2013-06-05 05:12:55,476 INFO > > > org.apache.hadoop.hbase.master.ServerManager: > > > > Waiting for region servers count to settle; currently checked in 1, > > slept > > > > for 3012 ms, expecting minimum of 1, maximum of 2147483647, timeout > of > > > 4500 > > > > ms, interval of 1500 ms. > > > > 2013-06-05 05:12:56,981 INFO > > > org.apache.hadoop.hbase.master.ServerManager: > > > > Finished waiting for region servers count to settle; checked in 1, > > slept > > > > for 4517 ms, expecting minimum of 1, maximum of 2147483647, master is > > > > running. > > > > 2013-06-05 05:12:57,019 INFO > > > > org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification > of > > > > -ROOT-,,0 at address=hbase.rummycircle.com,60020,1369877672964; > > > > java.io.EOFException > > > > 2013-06-05 05:17:52,302 WARN > > > > org.apache.hadoop.hbase.master.SplitLogManager: error while splitting > > > logs > > > > in [hdfs:// > > > > > > > > > > 192.168.20.30:9000/hbase/.logs/hbase.rummycircle.com,60020,1369877672964-splitting > > > ] > > > > installed = 19 but only 0 done > > > > 2013-06-05 05:17:52,321 FATAL org.apache.hadoop.hbase.master.HMaster: > > > > master:60000-0x13ef31264d00000 master:60000-0x13ef31264d00000 > received > > > > expired from ZooKeeper, aborting > > > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > > > KeeperErrorCode = Session expired > > > > java.io.IOException: Giving up after tries=1 > > > > Caused by: java.lang.InterruptedException: sleep interrupted > > > > 2013-06-05 05:17:52,381 ERROR > > > > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start > > master > > > > java.lang.RuntimeException: HMaster Aborted > > > > > > > > > > > > > > > > -- > > > > Thanks and Regards, > > > > Vimal Jain > > > > > > > > > > > > > > > > -- > > > Thanks and Regards, > > > Vimal Jain > > > > > > > > > -- > Thanks and Regards, > Vimal Jain >