On Wed, Aug 1, 2012 at 9:29 AM, lars hofhansl <lhofha...@yahoo.com> wrote:
> "sync" is a fluffy term in HDFS. HDFS has hsync and hflush. > hflush forces all current changes at a DFSClient to all replica nodes (but > not to disk). > > Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync > can be used to force data to disk at the replicas. > > > When HBase refers to "sync" the hflush semantics are meant (at least until > HBASE-5954 is finished). > I.e. a sync here ensures that the replica nodes have seen the changes, > which is what you want. > > > So when you say "since another copy is always there on the replica nodes", > that is only guaranteed after an hflush (again, which HBase calls sync). > > > I've also written about this here: > http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html > > -- Lars > > > Thanks this post is very helpful > > ________________________________ > From: Mohit Anchlia <mohitanch...@gmail.com> > To: user@hbase.apache.org > Sent: Tuesday, July 31, 2012 6:09 PM > Subject: sync on writes > > In the HBase book it mentioned that the default behaviour of write is to > call sync on each node before sending replica copies to the nodes in the > pipeline. Is there a reason this was kept default because if data is > getting written on multiple nodes then likelyhood of losing data is really > low since another copy is always there on the replica nodes. Is it ok to > make this sync async and is it advisable? >