Re: write availability

Nicolas Liochon Wed, 08 Apr 2015 08:09:58 -0700

Just to make it a little bit more complex, let me put repeat what Nick
already said in this thread:


"However, a detail of our region recovery process
is that a region actually comes online for writes *before* it's available
for reads. That is, it can recover into a state that is
available-for-writes faster than it can recover into a state that is
available-for-reads. "

HBase allows you to write while it replays the edits in parallel (and stays
strongly consistent). So for a write only load the MTTR is: zookeeper
failure detection + region allocation

Nicolas

On Wed, Apr 8, 2015 at 4:12 PM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> Thank you all a lot for the answers!
>
> From: este...@cloudera.com
> Subject: Re: write availability
>
>
> --
> Cloudera, Inc.
>
>
> On Tue, Apr 7, 2015 at 10:36 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemil...@bloomberg.net> wrote:
>
> Sorry, there is something I asked wrongly because I was understanding it
> wrongly.
> 1 region server correspond to 1 namenode and 1 write to 1 name node will
> replicate to 3 datanodes...
>
> Not really, but I think we understood the failure mode you were curious to
> know more about :)
>
>
> So to simplify the second question, what happens to the HBase cluster when
> 1 region server is down?
>
> The simple case is something like this: The HBase Master will get a
> notification from ZooKeeper that the znode for this RS has expired and will
> start the recovery process which will look up into the existing WALs on
> HDFS for this RS and will start the distributed log splitting of this WALs
> across the cluster. Once replaying the edits (writes) found in the WALs
> completes, the HBase Master will open the region on other RSs and reads and
> writes will be available for the clients immediately. With read replicas
> enabled, only writes will not be available until the log replay completes
> and that can features like the distributed log replay (HBASE-7006) can help
> to speed up the process. HBase provides other features like replication
> which can even help you further on HA and other disaster recovery scenarios.
>
> if you have more questions pelase let us know!
> esteban.
>
>
> -Marcelo
>
>
> From: Marcelo Valle (BLOOMBERG/ LONDON)
> Subject: Re: write availability
>
> Esteban,
>
> If I understood correctly what you said:
>
> > "For the failure mode you mention if all DNs go down (not the NN)
> clients will be blocked waiting for the acknowledge of a write to the DNs
> and after few retries the RS will consider there was a failure writing to
> the WAL, the RS will attempt to roll the WAL for a last time and if fails
> at this point the RS will consider this as a fatal condition and it will
> shutdown it self. At this point the client probably ran out of retries and
> will throw an exception to the application."
>
> If this scenario happens, when will my application be available to accept
> writes for that region again? When I do some manual intervention on the
> server?
>
> For example: support I split data by user ids, so each user is stored in a
> different region. In the scenario above, my application (and also the HBase
> cluster) would be working for some users and wouldn't be working for users
> whose user id is in a "down region" (a region where all corresponding DNs
> are down, considering 1 DN per RS). Is this right?
>
> -Marcelo.
>
> From: este...@cloudera.com
> Subject: Re: write availability
>
>
> Hello Marcelo,
>
> HBase has strong durability guarantees to avoid data loss. When a write
> arrives to a RegionServer data will be persisted into a Write-Ahead-Log (on
> HDFS) and temporarily in the RegionServer memory until the data from this
> memory store is flushed (also to HDFS).
>
> For the point of view of a client that is writing to HBase, if it
> receives a response for a successful write operation (put, delete, append,
> increment) then we can guarantee that data was correctly persisted to HDFS
> in the WAL and in case of a catastrophic failure of a RegionServer we will
> be able to recover as others have mentioned.
>
> For the failure mode you mention if all DNs go down (not the NN) clients
> will be blocked waiting for the acknowledge of a write to the DNs and after
> few retries the RS will consider there was a failure writing to the WAL,
> the RS will attempt to roll the WAL for a last time and if fails at this
> point the RS will consider this as a fatal condition and it will shutdown
> it self. At this point the client probably ran out of retries and will
> throw an exception to the application.
>
> If a single DN can recover before any of the RSs goes down, the writes
> will recover and the client will get the acknowledge that data has been
> persisted to HDFS (even with a single DN at this point), during this period
> the RS logs will warn that data is getting persisted with a lower number of
> replicas and data could be at risk.
>
> If you are further interested in the write path in HBase there is a really
> good blog post from Jimmy Xiang about this topic:
> http://blog.cloudera.com/blog/2012/06/hbase-write-path
>
> best,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Tue, Apr 7, 2015 at 9:04 AM, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemil...@bloomberg.net> wrote:
>
> Wellington,
>
> I might be misinterpreting this:
> http://stackoverflow.com/questions/13741946/role-of-datanode-regionserver-in-hbase-hadoop-integration
>
> But aren't HBase region servers and HDFS datanodes always in the same
> server? With a replication factor of 3, what happens if all 3 datanodes
> hosting that information go down and one of them come back, but with the
> disk intact? Considering from the time they went down to the time it went
> back HBase received new writes that would go to the same data node...
>
>
> From: user@hbase.apache.org
> Subject: Re: write availability
>
> The data is stored on files on hdfs. If a RS goes down, the master knows
> which regions were on that RS and which hdfs files contain data for these
> regions, so it will just assign the regions to others RS, and these others
> RS will have access to the regions data because it's stored on HDFS. The RS
> does not "own" the disk, this is HDFS job, so the recovery on this case is
> transparent.
>
>
> On 7 Apr 2015, at 16:51, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemil...@bloomberg.net> wrote:
>
> > So if a RS goes down, it's assumed you lost the data on it, right?
> > HBase has replications on HDFS, so if a RS goes down it doesn't mean I
> lost all the data, as I could have the replicas yet... But what happens if
> all RS hosting a specific region goes down?
> > What if one RS from this one comes back again, but with the disk intact,
> with all the data it had before crashing?
> >
> >
> > From: user@hbase.apache.org
> > Subject: Re: write availability
> >
> > When a RS goes down, the Master will try to assign the regions on the
> remaining RSes. When the RS comes back, after a while, the Master balancer
> process will re-distribute regions between RS, so the given RS will be
> hosting regions, but not necessarily the one it used to host before it went
> down.
> >
> >
> > On 7 Apr 2015, at 16:31, Marcelo Valle (BLOOMBERG/ LONDON) <
> mvallemil...@bloomberg.net> wrote:
> >
> >>> So if the cluster is up, then you can insert records in to HBase even
> though you lost a RS that was handing a specific region.
> >>
> >> What happens when the RS goes down? Writes to that region will be
> written to another region server? Another RS assumes the region "range"
> while the RS is down?
> >>
> >> What happens when the RS that was down goes up again?
> >>
> >>
> >> From: user@hbase.apache.org
> >> Subject: Re: write availability
> >>
> >> I don’t know if I would say that…
> >>
> >> I read Marcelo’s question of “if the cluster is up, even though a RS
> may be down, can I still insert records in to HBase?”
> >>
> >> So if the cluster is up, then you can insert records in to HBase even
> though you lost a RS that was handing a specific region.
> >>
> >> But because he talked about syncing nodes… I could be misreading his
> initial question…
> >>
> >>> On Apr 7, 2015, at 9:02 AM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
> >>>
> >>>> If I have an application that writes to a HBase cluster, can I count
> that
> >>> the cluster will always available to receive writes?
> >>> No, it's CP, not AP system.
> >>>> so everything get in sync when the other nodes get up again
> >>> There is no hinted backoff, It's not Cassandra.
> >>>
> >>>
> >>>
> >>> 2015-04-07 14:48 GMT+02:00 Marcelo Valle (BLOOMBERG/ LONDON) <
> >>> mvallemil...@bloomberg.net>:
> >>>
> >>>> If I have an application that writes to a HBase cluster, can I count
> that
> >>>> the cluster will always available to receive writes?
> >>>> I might not be able to read if a region server which handles a range
> of
> >>>> keys is down, but will I be able to keep writing to other nodes, so
> >>>> everything get in sync when the other nodes get up again?
> >>>> Or I might get no write availability for a while?
> >>
> >> The opinions expressed here are mine, while they may reflect a
> cognitive thought, that is purely accidental.
> >> Use at your own risk.
> >> Michael Segel
> >> michael_segel (AT) hotmail.com
> >
> >
>
>
>

Re: write availability

Reply via email to