As Bryan. Le 5 mars 2015 17:55, "Bryan Beaudreault" <bbeaudrea...@hubspot.com> a écrit :
> You should run with a backup master in a production cluster. The failover > process works very well and will cause no downtime. I've done it literally > hundreds of times across our multiple production hbase clusters. > > Even if you don't have a backup master, you should still be fine with > restarting the master. It can handle a brief blip without any problems, > from what I've seen. The master is really only used for coordination such > as region moves, RS failovers, etc. Your clients can still retrieve data > from your regionservers, as long as no servers die in the brief moment you > are masterless. > > On Thu, Mar 5, 2015 at 5:53 AM, Sandeep Reddy <sandeepvre...@outlook.com> > wrote: > > > Since ours is production cluster we cant restart master. > > In our test cluster I tested this scenario, and it got resolved after > > restarting master. > > Other than restarting master I couldn't find any solution. > > Thanks,Sandeep. > > > > > From: nkey...@gmail.com > > > Date: Wed, 4 Mar 2015 14:55:03 +0100 > > > Subject: Re: Where is HBase failed servers list stored > > > To: user@hbase.apache.org > > > > > > If I understand the issue correctly, restarting the master should solve > > the > > > problem. > > > > > > On Wed, Mar 4, 2015 at 5:55 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > Please see HBASE-13067 Fix caching of stubs to allow IP address > > changes of > > > > restarted remote servers > > > > > > > > Cheers > > > > > > > > On Tue, Mar 3, 2015 at 8:26 PM, Sandeep L <sandeepvre...@outlook.com > > > > > > wrote: > > > > > > > > > Hi nkeywal, > > > > > While trying to get more details about this issue I got to know > that > > > > > HMaster is trying to connect to wrong IP Address. > > > > > Here is exact issue: > > > > > Due to some unavoidable reason we are forced to change IP Address > of > > > > > regionsserver & then updated new IP Address in /etc/hosts file > > across all > > > > > HBase servers. I started RegionServer from master with > start-hbase.sh > > > > > scripts & jps output in regionserver shows it's(regionserver > > process) up > > > > > and running. > > > > > But when running hbase balancer HMaster is trying to connect to old > > IP > > > > > Address instead of new IP Address. > > > > > One more thing here is when I checked regionserver status on 60010 > > port > > > > > its showing as up and running. > > > > > Thanks,Sandeep. > > > > > > > > > > > From: nkey...@gmail.com > > > > > > Date: Tue, 3 Mar 2015 19:01:01 +0100 > > > > > > Subject: Re: Where is HBase failed servers list stored > > > > > > To: user@hbase.apache.org > > > > > > > > > > > > It's in local memory. When HBase cannot connect to a server, it > > puts it > > > > > > into the "failedServerList" for 2 seconds. This is to avoid > having > > all > > > > > the > > > > > > threads going into a potentially long socket timeout. Are you > sure > > that > > > > > you > > > > > > can connect from the master to this machine/port? > > > > > > > > > > > > You can change the time it stays in the list with > > > > > > hbase.ipc.client.failed.servers.expiry (in milliseconds), but it > > should > > > > > not > > > > > > help. > > > > > > > > > > > > You should have another exception before this one in the logs > (the > > one > > > > > that > > > > > > initially put this region server in this failedServerList). > > > > > > > > > > > > On Tue, Mar 3, 2015 at 12:08 PM, Sandeep L < > > sandeepvre...@outlook.com> > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > While trying to run hbase balancer I am getting error message > as > > > > "This > > > > > > > server is in the failed servers list".Due to this cluster is > not > > > > > getting > > > > > > > balanced. > > > > > > > Even though regionserver is up and running hmaster is unable to > > > > > connect to > > > > > > > it. > > > > > > > The odd thing here is hmaster is able to start regionserver and > > it is > > > > > > > detected as up and running but unable to assign regions. > > > > > > > Can some one suggest any solution for this. > > > > > > > Following is full stack > > > > > > > > > trace:org.apache.hadoop.hbase.ipc.RpcClient$FailedServerException: > > > > This > > > > > > > server is in the failed servers list: host1/192.168.2.20:60020 > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:853) > > > > > > > at > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543) > > > > > > > at > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442) > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:20964) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:671) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:2097) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1577) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1550) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:104) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager.handleRegion(AssignmentManager.java:999) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager$6.run(AssignmentManager.java:1447) > > > > > > > at > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager$3.run(AssignmentManager.java:1260) > > > > > > > at > > > > > > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > > > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > > > at > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > > > > at > > > > > > > > > > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > Thanks,Sandeep. > > > > > > > > > > > > > > > > > > >