Just to make it a little bit more complex, let me put repeat what Nick already said in this thread:
"However, a detail of our region recovery process is that a region actually comes online for writes *before* it's available for reads. That is, it can recover into a state that is available-for-writes faster than it can recover into a state that is available-for-reads. " HBase allows you to write while it replays the edits in parallel (and stays strongly consistent). So for a write only load the MTTR is: zookeeper failure detection + region allocation Nicolas On Wed, Apr 8, 2015 at 4:12 PM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > Thank you all a lot for the answers! > > From: este...@cloudera.com > Subject: Re: write availability > > > -- > Cloudera, Inc. > > > On Tue, Apr 7, 2015 at 10:36 AM, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemil...@bloomberg.net> wrote: > > Sorry, there is something I asked wrongly because I was understanding it > wrongly. > 1 region server correspond to 1 namenode and 1 write to 1 name node will > replicate to 3 datanodes... > > Not really, but I think we understood the failure mode you were curious to > know more about :) > > > So to simplify the second question, what happens to the HBase cluster when > 1 region server is down? > > The simple case is something like this: The HBase Master will get a > notification from ZooKeeper that the znode for this RS has expired and will > start the recovery process which will look up into the existing WALs on > HDFS for this RS and will start the distributed log splitting of this WALs > across the cluster. Once replaying the edits (writes) found in the WALs > completes, the HBase Master will open the region on other RSs and reads and > writes will be available for the clients immediately. With read replicas > enabled, only writes will not be available until the log replay completes > and that can features like the distributed log replay (HBASE-7006) can help > to speed up the process. HBase provides other features like replication > which can even help you further on HA and other disaster recovery scenarios. > > if you have more questions pelase let us know! > esteban. > > > -Marcelo > > > From: Marcelo Valle (BLOOMBERG/ LONDON) > Subject: Re: write availability > > Esteban, > > If I understood correctly what you said: > > > "For the failure mode you mention if all DNs go down (not the NN) > clients will be blocked waiting for the acknowledge of a write to the DNs > and after few retries the RS will consider there was a failure writing to > the WAL, the RS will attempt to roll the WAL for a last time and if fails > at this point the RS will consider this as a fatal condition and it will > shutdown it self. At this point the client probably ran out of retries and > will throw an exception to the application." > > If this scenario happens, when will my application be available to accept > writes for that region again? When I do some manual intervention on the > server? > > For example: support I split data by user ids, so each user is stored in a > different region. In the scenario above, my application (and also the HBase > cluster) would be working for some users and wouldn't be working for users > whose user id is in a "down region" (a region where all corresponding DNs > are down, considering 1 DN per RS). Is this right? > > -Marcelo. > > From: este...@cloudera.com > Subject: Re: write availability > > > Hello Marcelo, > > HBase has strong durability guarantees to avoid data loss. When a write > arrives to a RegionServer data will be persisted into a Write-Ahead-Log (on > HDFS) and temporarily in the RegionServer memory until the data from this > memory store is flushed (also to HDFS). > > For the point of view of a client that is writing to HBase, if it > receives a response for a successful write operation (put, delete, append, > increment) then we can guarantee that data was correctly persisted to HDFS > in the WAL and in case of a catastrophic failure of a RegionServer we will > be able to recover as others have mentioned. > > For the failure mode you mention if all DNs go down (not the NN) clients > will be blocked waiting for the acknowledge of a write to the DNs and after > few retries the RS will consider there was a failure writing to the WAL, > the RS will attempt to roll the WAL for a last time and if fails at this > point the RS will consider this as a fatal condition and it will shutdown > it self. At this point the client probably ran out of retries and will > throw an exception to the application. > > If a single DN can recover before any of the RSs goes down, the writes > will recover and the client will get the acknowledge that data has been > persisted to HDFS (even with a single DN at this point), during this period > the RS logs will warn that data is getting persisted with a lower number of > replicas and data could be at risk. > > If you are further interested in the write path in HBase there is a really > good blog post from Jimmy Xiang about this topic: > http://blog.cloudera.com/blog/2012/06/hbase-write-path > > best, > esteban. > > > -- > Cloudera, Inc. > > > On Tue, Apr 7, 2015 at 9:04 AM, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemil...@bloomberg.net> wrote: > > Wellington, > > I might be misinterpreting this: > http://stackoverflow.com/questions/13741946/role-of-datanode-regionserver-in-hbase-hadoop-integration > > But aren't HBase region servers and HDFS datanodes always in the same > server? With a replication factor of 3, what happens if all 3 datanodes > hosting that information go down and one of them come back, but with the > disk intact? Considering from the time they went down to the time it went > back HBase received new writes that would go to the same data node... > > > From: user@hbase.apache.org > Subject: Re: write availability > > The data is stored on files on hdfs. If a RS goes down, the master knows > which regions were on that RS and which hdfs files contain data for these > regions, so it will just assign the regions to others RS, and these others > RS will have access to the regions data because it's stored on HDFS. The RS > does not "own" the disk, this is HDFS job, so the recovery on this case is > transparent. > > > On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemil...@bloomberg.net> wrote: > > > So if a RS goes down, it's assumed you lost the data on it, right? > > HBase has replications on HDFS, so if a RS goes down it doesn't mean I > lost all the data, as I could have the replicas yet... But what happens if > all RS hosting a specific region goes down? > > What if one RS from this one comes back again, but with the disk intact, > with all the data it had before crashing? > > > > > > From: user@hbase.apache.org > > Subject: Re: write availability > > > > When a RS goes down, the Master will try to assign the regions on the > remaining RSes. When the RS comes back, after a while, the Master balancer > process will re-distribute regions between RS, so the given RS will be > hosting regions, but not necessarily the one it used to host before it went > down. > > > > > > On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemil...@bloomberg.net> wrote: > > > >>> So if the cluster is up, then you can insert records in to HBase even > though you lost a RS that was handing a specific region. > >> > >> What happens when the RS goes down? Writes to that region will be > written to another region server? Another RS assumes the region "range" > while the RS is down? > >> > >> What happens when the RS that was down goes up again? > >> > >> > >> From: user@hbase.apache.org > >> Subject: Re: write availability > >> > >> I don’t know if I would say that… > >> > >> I read Marcelo’s question of “if the cluster is up, even though a RS > may be down, can I still insert records in to HBase?” > >> > >> So if the cluster is up, then you can insert records in to HBase even > though you lost a RS that was handing a specific region. > >> > >> But because he talked about syncing nodes… I could be misreading his > initial question… > >> > >>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak <serega.shey...@gmail.com> > wrote: > >>> > >>>> If I have an application that writes to a HBase cluster, can I count > that > >>> the cluster will always available to receive writes? > >>> No, it's CP, not AP system. > >>>> so everything get in sync when the other nodes get up again > >>> There is no hinted backoff, It's not Cassandra. > >>> > >>> > >>> > >>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) < > >>> mvallemil...@bloomberg.net>: > >>> > >>>> If I have an application that writes to a HBase cluster, can I count > that > >>>> the cluster will always available to receive writes? > >>>> I might not be able to read if a region server which handles a range > of > >>>> keys is down, but will I be able to keep writing to other nodes, so > >>>> everything get in sync when the other nodes get up again? > >>>> Or I might get no write availability for a while? > >> > >> The opinions expressed here are mine, while they may reflect a > cognitive thought, that is purely accidental. > >> Use at your own risk. > >> Michael Segel > >> michael_segel (AT) hotmail.com > > > > > > >