Hello,

We've recently had a problem where regions will get stuck in transition for
a long period of time.  In fact, they don't ever appear to get
out-of-transition unless we take manual action.  Last time this happened I
restarted the master and they were cleared out.  This time I wanted to
consult the list first.

I checked the admin ui for all 24 of our servers, and the region does not
appear to be hosted anywhere.  If I look in hdfs, I do see the region there
and it has 2 files.  The first instance of this region in my HMaster logs
is:

2/04/15 17:48:06 INFO master.HMaster: balance
> hri=visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e.,
> src=XXXXXXXXX.ec2.internal,60020,1334064456919,
> dest=XXXXXXXX.ec2.internal,60020,1334064197946
> 12/04/15 17:48:06 INFO master.AssignmentManager: Server
> serverName=XXXXXXXX.ec2.internal,60020,1334064456919, load=(requests=0,
> regions=0, usedHeap=0, maxHeap=0) returned
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Received close for
> visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e.
> but we are not serving it for 703fed4411f2d6ff4b3ea80506fb635e


It then keeps saying the same few logs every ~30 mins:

12/04/15 18:18:18 INFO master.AssignmentManager: Regions in transition
> timed out:
>  
> visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e.
> state=PENDING_CLOSE, ts=1334526491544, server=null
> 12/04/15 18:18:18 INFO master.AssignmentManager: Region has been
> PENDING_CLOSE for too long, running forced unassign again on
> region=visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e.
> 12/04/15 18:18:18 INFO master.AssignmentManager: Server
> serverName=XXXXXXXXX.ec2.internal,60020,1334064456919, load=(requests=0,
> regions=0, usedHeap=0, maxHeap=0) returned
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException: Received close for
> visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e.
> but we are not serving it for 703fed4411f2d6ff4b3ea80506fb635e


Any ideas how I can avoid this, or a better solution than restarting the
HMaster?

Thanks,

Bryan

Reply via email to