I don't know the details of operation of HBase, so I can't speak on that point. But, I do know that Facebook hired Jonathan Grey, former CTO of Streamy, who is a huge HBase contributor. Streamy ended in Mar 2010 - although I'm not sure when he went to work for Facebook.
He presented on HBase at the Hadoop conference in October in NYC: http://mpouttuclarke.wordpress.com/2010/10/18/notes-from-hadoop-world-2010-nyc/ Again, I don't know the chronology (whether he was hired before the decision to use hbase or after). But I know that Jonathan is a fantastically smart (and extremely nice) guy and I'm sure he could make HBase bend to his will at any point. Dave Viner On Sun, Nov 21, 2010 at 4:16 PM, Todd Lipcon <t...@lipcon.org> wrote: > On Sun, Nov 21, 2010 at 2:06 PM, Edward Ribeiro > <edward.ribe...@gmail.com>wrote: > >> >> Also I believe saying HBASE is consistent is not true. This can happen: >>> Write to region server. -> Region Server acknowledges client-> write >>> to WAL -> region server fails = write lost >>> >>> I wonder how facebook will reconcile that. :) >>> >> >> Are you sure about that? Client writes to WAL before ack user? >> >> According to these posts[1][2], "if writing the record to the WAL fails >> the whole operation must be considered a failure.", so it would be nonsense >> acknowledge clients before writing the lifeline. I hope any cloudera guy >> explain this... >> >> > [only jumping in because info was requested - those who know me know that I > think Cassandra is a very interesting architecture and a better fit for many > applications than HBase] > > You can operate the commit log in two different modes in HBase. One mode is > "deferred log flush", where the region server appends but does not sync() > the commit log to HDFS on every write, but rather on a periodic basis (eg > once a second). This is similar to the innodb_flush_log_at_trx_commit=2 > option in MySQL for example. This has slightly better performance obviously > since the writer doesn't need to wait on the commit, but as you noted > there's a window where a write may be acknowledged but then lost. This is an > issue of *durability* moreso than consistency. > > In the other mode of operation (default in recent versions of HBase) we do > not acknowledge a write until it has been pushed to the OS buffer on the > entire pipeline of log replicas. Obviously this is slower, but it results in > "no lost data" regardless of any machine failures. Additionally, concurrent > readers do not see written data until these same properties have been > satisfied. So this mode is 100% consistent and 100% durable. In practice, > this effects latency significantly since it adds two extra round trips to > each write, but system throughput is only reduced by 20-30% since the > commits are pipelined (see HDFS-895 for gory details) > > I believe Cassandra has similar tuning options about whether to sync every > commit to the log or only do so periodically. > > If you're interested in learning more, feel free to reference this > documentation: > http://hbase.apache.org/docs/r0.89.20100726/acid-semantics.html > > > >> Besides that, you know that WAL is written to HDFS that takes care of >> replication and fault tolerance, right? Of course, even so, there's a >> "window of inconsistency" before the HLog is flushed to disk, but I don't >> think you can dismiss this as not consistent. At most, you may classify it >> as "eventual consistent". :) >> >> [1] http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> [2] >> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html >> >> E. Ribeiro >> >> >