JIRA creation is done: https://issues.apache.org/jira/browse/HBASE-17791
I will have a look into it over the next few days, maybe I can come up
with a patch.
On Wed, Mar 15, 2017 at 6:14 AM, Stack wrote:
> File a blocker please Lars. I'm pretty sure the boolean on whether we are
> doing a recove
File a blocker please Lars. I'm pretty sure the boolean on whether we are
doing a recovery or not has been there a long time so yeah, a single server
recovery could throw us off, but you make a practical point, that one
server should not destroy locality over the cluster.
St.Ack
On Tue, Mar 14, 2
Wait, HBASE-15251 is not enough methinks. The checks added help, but
are not covering all the possible edge cases. In particular, say a
node really fails, why not just reassign the few regions it did hold
and leave all the others where they are? Seems insane as it is.
On Tue, Mar 14, 2017 at 2:24
Looking at the code more... it seems the issue is here
In AssignmentManager.processDeadServersAndRegionsInTransition():
...
failoverCleanupDone();
if (!failover) {
// Fresh cluster startup.
LOG.info("Clean cluster startup. Assigning user regions");
assignAllUserRegions(allRegions);
}
...
A
Hi,
I had this happened at multiple clusters recently where after the
restart the locality dropped from close to or exactly 100% down to
single digits. The reason is that all regions were completely shuffled
and reassigned to random servers. Upon reading the (yet again
non-trivial) assignment code
Doh, https://issues.apache.org/jira/browse/HBASE-15251 addresses this
(though I am not sure exactly how, see below). This should be
backported to all 1.x branches!
As for the patch, I see this
if (!failover) {
// Fresh cluster startup.
- LOG.info("Clean cluster startup. Assigning