Hi Lucas,

First, the write request for HBase consists of two parts:
1. Write into WAL;
2. Write into Memstore, when Memstore reaches the threshold, the data in
Memstore will be flushed into disk.

In my understanding, there are two data synchronization points:

The first one is write to WAL. As WAL is persistent on the local disk, it
will be propagated into the other 2 nodes (suppose the replica number is 3).
The second on is when Memstore reaches the threshold, and the data in
Memsotre will be flushed into disk. When this happens, it will also cause
the the pipeline data writing.

regards

Yong



On Tue, Jun 11, 2013 at 2:39 AM, Lucas Stanley <lucas23...@gmail.com> wrote:

> Thanks Azuryy!
>
> So, when a write is successful to the WAL on the responsible region server,
> in fact that means that the write was committed to 3 total DataNodes,
> correct?
>
>
> On Mon, Jun 10, 2013 at 5:37 PM, Azuryy Yu <azury...@gmail.com> wrote:
>
> > yes. datanode write is pipeline. and only if pipeline writing finished,
> dn
> > return ok.
> >
> > --Send from my Sony mobile.
> > On Jun 11, 2013 8:27 AM, "Lucas Stanley" <lucas23...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > In the Strata 2013 training lectures, Jonathan Hsieh from Cloudera said
> > > something about HBase syncs which I'm trying to understand further.
> > >
> > > He said that HBase sync guarantees only that a write goes to the local
> > disk
> > > on the region server responsible for that region and in-memory copies
> go
> > on
> > > 2 other machines in the HBase cluster.
> > >
> > > But I thought that when the write goes to the WAL on the first region
> > > server, that the HDFS append would push that write to 3 machines total
> in
> > > the HDFS cluster. In order for the append write to the WAL to be
> > > successful, doesn't the DataNode on that machine have to pipeline the
> > write
> > > to 2 other DataNodes?
> > >
> > > I'm not sure what Jonathan was referring to when he said that 2
> in-memory
> > > copies go to other HBase machines? Even when the memstore on the first
> > > region server gets full, doesn't the flush to the HFile get written on
> 3
> > > HDFS nodes in total?
> > >
> >
>

Reply via email to