On Wed, Aug 1, 2012 at 9:29 AM, lars hofhansl <lhofha...@yahoo.com> wrote:

> "sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
> hflush forces all current changes at a DFSClient to all replica nodes (but
> not to disk).
>
> Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync
> can be used to force data to disk at the replicas.
>
>
> When HBase refers to "sync" the hflush semantics are meant (at least until
> HBASE-5954 is finished).
> I.e. a sync here ensures that the replica nodes have seen the changes,
> which is what you want.
>
>
> So when you say "since another copy is always there on the replica nodes",
> that is only guaranteed after an hflush (again, which HBase calls sync).
>
>
> I've also written about this here:
> http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html
>
> -- Lars
>
>
>
Thanks this post is very helpful

>
> ________________________________
>  From: Mohit Anchlia <mohitanch...@gmail.com>
> To: user@hbase.apache.org
> Sent: Tuesday, July 31, 2012 6:09 PM
> Subject: sync on writes
>
> In the HBase book it mentioned that the default behaviour of write is to
> call sync on each node before sending replica copies to the nodes in the
> pipeline. Is there a reason this was kept default because if data is
> getting written on multiple nodes then likelyhood of losing data is really
> low since another copy is always there on the replica nodes. Is it ok to
> make this sync async and is it advisable?
>

Reply via email to