.META. may not come back online if > number of executors servers crash and one 
of those > number of executors was carrying meta
-------------------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3809
                 URL: https://issues.apache.org/jira/browse/HBASE-3809
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Priority: Critical
             Fix For: 0.92.0


This is a duplicate of another issue but at the moment I cannot find the 
original.

If you had a 700 node cluster and then you ran something on the cluster which 
killed 100 nodes, and .META. had been running on one of those downed nodes, 
well, you'll have all of your master executors processing ServerShutdowns and 
more than likely non of the currently processing executors will be servicing 
the shutdown of the server that was carrying .META.

Well, for server shutdown to complete at the moment, an online .META. is 
required.  So, in the above case, we'll be stuck. The current executors will 
not be able to clear to make space for the processing of the server carrying 
.META. because they need .META. to complete.

We can make the master handlers have no bound so it will expand to accomodate 
all crashed servers -- so it'll have the one .META. in its queue -- or we can 
change it so shutdown handling doesn't require .META. to be on-line (its used 
to figure the regions the server was carrying); we could use the master's 
in-memory picture of the cluster (But IIRC, there may be holes ....TBD)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to