I've tried to post the below comment twice at 

    The problems with ACID, and how to fix them without going NoSQL

For whatever reason, it has appeared in the comments section from my 
perspective briefly twice and then disappeared twice, so I will just post it 
here, because HBase is mentioned in the article a few times, and ... well, just 
read. :-)


Many earlier comments have covered much of what I would say. However, nobody to 
date has raised an objection to the mildly offensive contention that "the NoSQL 
decision to give up on ACID is the lazy solution to these scalability and 
replication issues." Possibly this was not meant in the pejorative sense, but 
it reads that way. I would argue the correct term of art here is pragmatism, 
not laziness. 

I am a contributor to the HBase project. HBase is an open source implementation 
of the BigTable architecture. Indeed our system does scale out by substantially 
relaxing the scope of ACID guarantees. But it is a gross generalization to 
suggest "NoSQL" is "NoACID", and somehow lazy in the pejorative sense, and this 
mars the argument of the authors. HBase at least in particular provides 
durability, row-level atomicity (agree here this is a nice convenient 
partition), and favors strong consistency in its design choices. In this 
regard, I would also like to bring to your attention that the authors made an 
error describing the scope of transactional atomicity available in BigTable -- 
the scope is actually the row, not each individual KV. 

Also, at least HBase in particular is a big project with several interesting 
design/research directions and so does not reduce to a convenient stereotype: a 
transactional layer that provides global ACID properties at user option (that 
does not scale out like the underlying system but is nonetheless available), 
exploration of notions of referential integrity, even consideration of optional 
relaxed consistency (read replicas) in the other direction. 

Back to the matter of pragmatism: While it is likely most structured data store 
users are not building systems on the scale of a globally distributed search 
engine, actually that is not too far off the mark for the design targets of 
some HBase installations. We indeed do need to work with very large mutating 
data sets today and nothing in the manner of a traditional relational database 
system is up to the task. The discussion here, while intriguing, is also 
rendered fairly academic by the "horrible" performance if spinning media is 
used. Flash will not be competitive with spinning media at high tera- or 
peta-scale for at least several years yet. Other commenters have also noticed 
apparent bottlenecks in the presented design which suggest a high scale 
implementation will be problematic.

Anyway, it is my belief we are attacking the same set of problems but are 
starting at it on opposing sides of a continuum and, ultimately, we shall meet 
up somewhere in the middle. 

September 2, 2010 10:55 AM  


   - Andy


