The flaws with the paper are insanely obvious if you look at them: - their solution doesn't run on disk. Many things get faster when you restrict yourself to RAM/flash - their solution doesn't scale! Looks like a shared nothing sharding with global transaction ordering and no internal locks.
Or am I missing something big here? I generally find it tiresome when people bash on bigtable, yet their "awesome" thing doesn't scale to multi-PB databases. Reminds me of that "time for an architectural rewrite" which was essentially "if you do everything in 1 thread/CPU you dont need locks and are faster". This was just the same thing as far as I can tell from skimming the paper. On Thu, Sep 2, 2010 at 12:36 PM, Andrew Purtell <apurt...@apache.org> wrote: > I've tried to post the below comment twice at > > The problems with ACID, and how to fix them without going NoSQL > > http://dbmsmusings.blogspot.com/2010/08/problems-with-acid-and-how-to-fix-them.html > > For whatever reason, it has appeared in the comments section from my > perspective briefly twice and then disappeared twice, so I will just post it > here, because HBase is mentioned in the article a few times, and ... well, > just read. :-) > >>>> > > Many earlier comments have covered much of what I would say. However, nobody > to date has raised an objection to the mildly offensive contention that "the > NoSQL decision to give up on ACID is the lazy solution to these scalability > and replication issues." Possibly this was not meant in the pejorative sense, > but it reads that way. I would argue the correct term of art here is > pragmatism, not laziness. > > I am a contributor to the HBase project. HBase is an open source > implementation of the BigTable architecture. Indeed our system does scale out > by substantially relaxing the scope of ACID guarantees. But it is a gross > generalization to suggest "NoSQL" is "NoACID", and somehow lazy in the > pejorative sense, and this mars the argument of the authors. HBase at least > in particular provides durability, row-level atomicity (agree here this is a > nice convenient partition), and favors strong consistency in its design > choices. In this regard, I would also like to bring to your attention that > the authors made an error describing the scope of transactional atomicity > available in BigTable -- the scope is actually the row, not each individual > KV. > > Also, at least HBase in particular is a big project with several interesting > design/research directions and so does not reduce to a convenient stereotype: > a transactional layer that provides global ACID properties at user option > (that does not scale out like the underlying system but is nonetheless > available), exploration of notions of referential integrity, even > consideration of optional relaxed consistency (read replicas) in the other > direction. > > Back to the matter of pragmatism: While it is likely most structured data > store users are not building systems on the scale of a globally distributed > search engine, actually that is not too far off the mark for the design > targets of some HBase installations. We indeed do need to work with very > large mutating data sets today and nothing in the manner of a traditional > relational database system is up to the task. The discussion here, while > intriguing, is also rendered fairly academic by the "horrible" performance if > spinning media is used. Flash will not be competitive with spinning media at > high tera- or peta-scale for at least several years yet. Other commenters > have also noticed apparent bottlenecks in the presented design which suggest > a high scale implementation will be problematic. > > Anyway, it is my belief we are attacking the same set of problems but are > starting at it on opposing sides of a continuum and, ultimately, we shall > meet up somewhere in the middle. > > September 2, 2010 10:55 AM > > <<< > > - Andy > > > > > >