i remember seeing this error in our deployment as well.. can you check your
gc logs to see if there are long gc times. also look at your zookeeper logs
to see whats going on..
i tried bunch of things, so not sure what worked. but what i did was
increase zookeeper connections and timeout limits and that did the trick
IIRC.

thanks

On Wed, Feb 22, 2012 at 10:55 PM, Lu, Wei <[email protected]> wrote:

> Hi,
>
> I met with a weird problem when using HBase. There are 3 machines: 1
> master and  2 region servers (wlu-rs1/10.27.17.251 and wlu-rs2/10.27.16.11
> ).
> But when I use "status 'detailed'" to see region servers' status, it show
> there are three server, and one server appears twice (exactly same).
> 3 live servers
> 10.27.17.251:60020 1329975187706
> 10.27.16.11:60020 1329975209046
> 10.27.17.251:60020 1329975187706
>
> When balance begins, region server 10.27.17.251 seems to move data from &
> to itself, and FATAL error occurs.
>
> Log info of HMaster:
>
> 2012-02-23 00:01:00,629 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
> hri=usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429.,
> src=wlu-rs1,60020,1329968056162, dest=10.27.17.251,60020,1329968056162
> 2012-02-23 00:01:00,629 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of
> region
> usertable,user172022781,1329972455493.943849e136aa6f7a343d47fed57da429.
> (offlining)
> 2012-02-23 00:01:09,712 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned
> node: /hbase/unassigned/ad483f3806a03756f3f47cd8bd220d09
> (region=usertable,user819517397,1329972500402.ad483f3806a03756f3f47cd8bd220d09.,
> server=wlu-rs1,60020,1329968056162, state=RS_ZK_REGION_CLOSING)
> 2012-02-23 00:01:09,712 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Handling
> transition=RS_ZK_REGION_CLOSING, server=wlu-rs1,60020,1329968056162,
> region=ad483f3806a03756f3f47cd8bd220d09
> 2012-02-23 00:01:12,678 FATAL org.apache.hadoop.hbase.master.HMaster:
> Remote unexpected exception
> java.io.IOException: Call to /10.27.17.251:60020 failed on local
> exception: java.io.EOFException
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:806)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775)
>                at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>                at $Proxy6.closeRegion(Unknown Source)
>                at
> org.apache.hadoop.hbase.master.ServerManager.sendRegionClose(ServerManager.java:601)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1123)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1070)
>                at
> org.apache.hadoop.hbase.master.AssignmentManager.balance(AssignmentManager.java:1930)
>                at
> org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:694)
>                at
> org.apache.hadoop.hbase.master.HMaster$1.chore(HMaster.java:585)
>                at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
> Caused by: java.io.EOFException
>                at java.io.DataInputStream.readInt(Unknown Source)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539)
>                at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477)
> 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
> 2012-02-23 00:01:12,680 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
>
>
> I use HBase0.90.3 and Hadoop0.20.2. Can anyone please help to figure this
> out?
>
>
>
> Regards,
> Wei
>
>

Reply via email to