Hi Lu, I remember that i had similar issue with wrong number of region servers reported to master. In my case it was issue with reverse name resolution so i think you should check DNS settings and /etc/hosts. Try ping -c 2 $HOSTNAME on regionserver that is reported twice (10.27.17.251<http://10.27.17.251:60020/>) and correct file $HBASE_HOME/config/regionservers with HOSTNAME reported by ping -c 2 $HOSTNAME command. You also should check: http://hbase.apache.org/book/os.html
On Thu, Feb 23, 2012 at 8:49 AM, Lu, Wei <[email protected]> wrote: > Hi, > > I met with a weird problem when using HBase. There are 3 machines: 1 > master and 2 region servers (wlu-rs1/10.27.17.251 and wlu-rs2/10.27.16.11 > ). > But when I use "status 'detailed'" to see region servers' status, it show > there are three server, and one server appears twice (exactly same). > 3 live servers > 10.27.17.251:60020 1329975187706 > 10.27.16.11:60020 1329975209046 > 10.27.17.251:60020 1329975187706 > > When balance begins, region server 10.27.17.251 seems to move data from & > to itself, and FATAL error occurs. > > Log info of HMaster: > > 2012-02-23 00:01:00,629 INFO org.apache.hadoop.hbase.master.HMaster: > balance > hri=usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429., > src=wlu-rs1,60020,1329968056162, dest=10.27.17.251,60020,1329968056162 > 2012-02-23 00:01:00,629 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of > region > usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429. > (offlining) > 2012-02-23 00:01:09,712 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned > node: /hbase/unassigned/ad483f3806a03756f3f47cd8bd220d09 > (region=usertable,user819517397,1329972500402.ad483f3806a03756f3f47cd8bd220d09., > server=wlu-rs1,60020,1329968056162, state=RS_ZK_REGION_CLOSING) > 2012-02-23 00:01:09,712 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSING, server=wlu-rs1,60020,1329968056162, > region=ad483f3806a03756f3f47cd8bd220d09 > 2012-02-23 00:01:12,678 FATAL org.apache.hadoop.hbase.master.HMaster: > Remote unexpected exception > java.io.IOException: Call to /10.27.17.251:60020 failed on local > exception: java.io.EOFException > at > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806) > at > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > at $Proxy6.closeRegion(Unknown Source) > at > org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:601) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1123) > at > org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1070) > at > org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1930) > at > org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:694) > at > org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:585) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) > 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster: > Aborting > 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster: > balance > > > I use HBase0.90.3 and Hadoop0.20.2. Can anyone please help to figure this > out? > > > > Regards, > Wei > >
