The exception originated from Web UI corresponding to HBaseAdmin.listTables(). At that moment, master was unable to process the request - it needed some more time.
Cheers On Sun, May 17, 2015 at 11:03 PM, Louis Hust <louis.h...@gmail.com> wrote: > Yes, ted, can you tell me what the following excpetion means in > l-namenode1.log? > > 2015-05-15 12:16:40,522 INFO > [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0] > handler.ServerShutdownHandler: Finished processing of shutdown of > l-hbase31.data.cn8.qunar.com,60020,1427789773001 > 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] > client.HConnectionManager$HConnectionImplementation: Checking master > connection > > Does this mean the cluster was not operational? > > > 2015-05-18 11:45 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > > After l-namenode1 became active master , it assigned regions: > > > > 2015-05-15 12:16:40,432 INFO [master:l-namenode1:60000] > > master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff > > state=OFFLINE, ts=1431663400432, server=null} to > > {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server= > > l-hbase31.data.cn8.qunar.com,60020,1431462584879} > > > > However, l-hbase31 went down: > > > > 2015-05-15 12:16:40,508 INFO > > [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0] > > handler.ServerShutdownHandler: Splitting logs for > > l-hbase31.data.cn8.qunar.com,60020,1427789773001 before assignment. > > > > l-namenode1 was restarted : > > > > 2015-05-15 12:20:25,322 INFO [main] util.VersionInfo: HBase > 0.96.0-hadoop2 > > 2015-05-15 12:20:25,323 INFO [main] util.VersionInfo: Subversion > > https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434 > > > > However, it went down due to zookeeper session expiration: > > > > 2015-05-15 12:20:25,580 WARN [main] zookeeper.ZooKeeperNodeTracker: > Can't > > get or delete the master znode > > org.apache.zookeeper.KeeperException$SessionExpiredException: > > KeeperErrorCode = Session expired for /hbase/master > > > > It started again after that and AssignmentManager did a lot of > assignments. > > > > Looks like the cluster was operational this time. > > > > Cheers > > > > On Sun, May 17, 2015 at 8:24 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > bq. the backup master take over at 2015-05-15 12:16:40,024 ? > > > > > > The switch of active master should be earlier than 12:16:40,024 - > shortly > > > after 12:15:58 > > > > > > l-namenode1 would do some initialization (such as waiting for region > > > servers count to settle) after it became active master. > > > > > > I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but > > > the download progress was very slow. > > > > > > Will try downloading later in the day. > > > > > > Do you have access to pastebin ? > > > > > > Cheers > > > > > > On Sun, May 17, 2015 at 2:07 AM, Louis Hust <louis.h...@gmail.com> > > wrote: > > > > > >> Hi, ted, > > >> > > >> Thanks for your reply!! > > >> > > >> I found the log in l-namenode2.dba.cn8 during the restarting progress: > > >> 2015-05-15 12:11:36,540 INFO [master:l-namenode2:60000] > > >> master.ServerManager: Finished waiting for region servers count to > > settle; > > >> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of > > >> 2147483647, master is running. > > >> > > >> So this means the HMaster ready for handle request at 12:11:36? > > >> > > >> The backup master is l-namenode1.dba.cn8 and you can get the log at : > > >> > > >> http://pan.baidu.com/s/1eQlKXj0 > > >> > > >> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58, > > >> the backup master l-namenode1 take over, and i found log: > > >> > > >> 2015-05-15 12:16:40,024 INFO [master:l-namenode1:60000] > > >> master.ServerManager: Finished waiting for region servers count to > > settle; > > >> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of > > >> 2147483647, master is running. > > >> > > >> So the backup master take over at 2015-05-15 12:16:40,024 ? > > >> > > >> But it seems the l-namenode2 not working normally with the exception > in > > >> log: > > >> > > >> 2015-05-15 12:16:40,522 INFO > > >> [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0] > > >> handler.ServerShutdownHandler: Finished processing of shutdown of > > >> l-hbase31.data.cn8.qunar.com,60020,1427789773001 > > >> 2015-05-15 12:17:11,301 WARN [686544788@qtp-660252776-212] > > >> client.HConnectionManager$HConnectionImplementation: Checking master > > >> connection > > >> com.google.protobuf.ServiceException: java.net.ConnectException: > > >> Connection > > >> refused > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216) > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) > > >> at > > >> > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836) > > >> > > >> Is the exception means the HMaster not working normally or somewhat? > > >> > > >> > > >> > > >> 2015-05-17 11:06 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > >> > > >> > bq. the HMaster is handling two region server down, and not ready to > > >> handle > > >> > client request? > > >> > > > >> > I didn't mean that - for a functioning master, handling region > server > > >> > shutdown is part of the master's job. > > >> > > > >> > You should see something similar to the following in (functioning) > > >> master > > >> > log: > > >> > > > >> > 2015-05-13 04:06:36,266 INFO [master:c6401:60000] > > master.ServerManager: > > >> > Finished waiting for region servers count to settle; checked in 1, > > slept > > >> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master > > is > > >> > running. > > >> > > > >> > bq. wait the backend HMaster to take over > > >> > > > >> > Was there exception in backup master log after it took over ? > > >> > > > >> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <louis.h...@gmail.com> > > >> wrote: > > >> > > > >> > > hi,ted, > > >> > > > > >> > > Thanks very much! > > >> > > > > >> > > Namenode process was not running on > > >> l-namenode2.dba.cn8(192.168.39.22), > > >> > > just HMaster run on it。 > > >> > > So you means that at 2015-05-15 12:15:04, the HMaster is handling > > two > > >> > > region server down, and not > > >> > > ready to handle client request? And how can i tell when the > HMaster > > is > > >> > > ready to handle client request > > >> > > from the logs? > > >> > > > > >> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling > > >> > request, > > >> > > so i want to stop it and wait > > >> > > the backend HMaster to take over. > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > >> > > > > >> > > > In the period you identified, master was assigning regions. > > >> > > > e.g. > > >> > > > > > >> > > > 2015-05-15 12:13:09,683 INFO > > >> > > > [l-namenode2.dba.cn8.qunar.com > > >> > > ,60000,1431663090427-GeneralBulkAssigner-0] > > >> > > > master.RegionStates: Transitioned > > {c634280ce287b2d6cebd88b61accf685 > > >> > > > state=OFFLINE, ts=1431663189621, server=null} to > > >> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, > > >> ts=1431663189683, > > >> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651} > > >> > > > 2015-05-15 12:13:09,683 INFO > > >> > > > [l-namenode2.dba.cn8.qunar.com > > >> > > ,60000,1431663090427-GeneralBulkAssigner-2] > > >> > > > master.RegionStates: Transitioned > > {2f60b1b4e51d32ef98ad19690f13a565 > > >> > > > state=OFFLINE, ts=1431663189621, server=null} to > > >> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, > > >> ts=1431663189683, > > >> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233} > > >> > > > > > >> > > > Then two region servers went down: > > >> > > > > > >> > > > 2015-05-15 12:14:40,699 INFO [main-EventThread] > > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node > > deleted, > > >> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020, > > >> > > > 1431663208899] > > >> > > > 2015-05-15 12:15:04,899 INFO [main-EventThread] > > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node > > deleted, > > >> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020, > > >> > > > 1431663193865] > > >> > > > > > >> > > > Master was stopped afterwards: > > >> > > > > > >> > > > Fri May 15 12:15:58 CST 2015 Terminating master > > >> > > > > > >> > > > Namenode process was running on l-namenode2.dba.cn8, right ? > > >> > > > > > >> > > > Cheers > > >> > > > > > >> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust < > louis.h...@gmail.com > > > > > >> > > wrote: > > >> > > > > > >> > > > > hi, TED, > > >> > > > > Any idea? > > >> > > > > When the HMaster restart, how can i know when it is really can > > >> handle > > >> > > > > request from application? is there any mark in logs? > > >> > > > > > > >> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <louis.h...@gmail.com>: > > >> > > > > > > >> > > > > > @Ted, > > >> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange > the > > >> > HMaster > > >> > > > is > > >> > > > > > in restarting stage, but can not handle request from client? > > Is > > >> the > > >> > > > > HMaster > > >> > > > > > recovering or do something else? > > >> > > > > > > > >> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <louis.h...@gmail.com > >: > > >> > > > > > > > >> > > > > >> OK, you can get the log from > > >> > > > > >> http://pan.baidu.com/s/1pqS6E > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yuzhih...@gmail.com>: > > >> > > > > >> > > >> > > > > >>> Can you check server log on 192.168.39.22 > > >> > > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 > > > > ? > > >> > > > > >>> > > >> > > > > >>> That should give you some clue. > > >> > > > > >>> > > >> > > > > >>> Cheers > > >> > > > > >>> > > >> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust < > > >> > louis.h...@gmail.com> > > >> > > > > >>> wrote: > > >> > > > > >>> > > >> > > > > >>> > Hi all, > > >> > > > > >>> > > > >> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0, > > >> > > > > >>> > and the custom said they can not write into hbase > cluster, > > >> > > > > >>> > So i stop the HMaster and start it soon, > > >> > > > > >>> > > > >> > > > > >>> > But it seems that the HMaster not response to request, > > >> > following > > >> > > is > > >> > > > > the > > >> > > > > >>> > HMaster log: > > >> > > > > >>> > > > >> > > > > >>> > {logs} > > >> > > > > >>> > 2015-05-15 12:13:33,136 INFO [AM.ZK.Worker-pool2-t16] > > >> > > > > >>> master.RegionStates: > > >> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa > > >> state=OPENING, > > >> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com > > >> > > > > >>> ,60020,1431462615651} > > >> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, > > >> > > ts=1431663213136, > > >> > > > > >>> server= > > >> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651} > > >> > > > > >>> > 2015-05-15 12:13:33,139 INFO [AM.ZK.Worker-pool2-t4] > > >> > > > > >>> master.RegionStates: > > >> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on > > >> > > > > >>> l-hbase26.data.cn8.qunar.com > > >> > > > > >>> > ,60020,1431462615651 > > >> > > > > >>> > 2015-05-15 12:14:40,699 INFO [main-EventThread] > > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral > node > > >> > > deleted, > > >> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com > > >> > > > > >>> ,60020,1431663208899] > > >> > > > > >>> > 2015-05-15 12:15:04,899 INFO [main-EventThread] > > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral > node > > >> > > deleted, > > >> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com > > >> > > > > >>> ,60020,1431663193865] > > >> > > > > >>> > 2015-05-15 12:15:24,465 WARN > [249240421@qtp-591022857-33 > > ] > > >> > > > > >>> > client.HConnectionManager$HConnectionImplementation: > > >> Checking > > >> > > > master > > >> > > > > >>> > connection > > >> > > > > >>> > com.google.protobuf.ServiceException: > > >> > > > > java.net.SocketTimeoutException: > > >> > > > > >>> Call > > >> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 > > failed > > >> > > > because > > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout > > while > > >> > > waiting > > >> > > > > for > > >> > > > > >>> > channel to be ready for read. ch : > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/ > > >> > > > 192.168.39.22:47700 > > >> > > > > >>> > remote= > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95) > > >> > > > > >>> > at > > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:734) > > >> > > > > >>> > at > > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:847) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > >> > > > > >>> > at > > >> > > > > > > >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > >> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > >> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > > >> > > > > >>> > at > > >> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > > >> > > > > >>> > at > > >> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > > >> > > > > >>> > at > > >> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 > failed > > >> > because > > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout > > while > > >> > > waiting > > >> > > > > for > > >> > > > > >>> > channel to be ready for read. ch : > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/ > > >> > > > 192.168.39.22:47700 > > >> > > > > >>> > remote= > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475) > > >> > > > > >>> > at > > >> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650) > > >> > > > > >>> > ... 37 more > > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis > > >> > timeout > > >> > > > > while > > >> > > > > >>> > waiting for channel to be ready for read. ch : > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/ > > >> > > > 192.168.39.22:47700 > > >> > > > > >>> > remote= > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000] > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > > >> > > > > >>> > at > > >> java.io.FilterInputStream.read(FilterInputStream.java:133) > > >> > > > > >>> > at > > >> java.io.FilterInputStream.read(FilterInputStream.java:133) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553) > > >> > > > > >>> > at > > >> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > > >> > > > > >>> > at > > >> > java.io.BufferedInputStream.read(BufferedInputStream.java:254) > > >> > > > > >>> > at > > java.io.DataInputStream.readInt(DataInputStream.java:387) > > >> > > > > >>> > at > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057) > > >> > > > > >>> > at > > >> > > > > >>> > > >> > > > > > > >> > > > > >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719) > > >> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master > > >> > > > > >>> > {/logs} > > >> > > > > >>> > So what the exception means? Why? and how to solve the > > >> problem? > > >> > > > > >>> > > > >> > > > > >>> > > >> > > > > >> > > >> > > > > >> > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > >