Hey y'all,

There've been a few questions about distributed database solutions (a
partial list: HBase, Voldemort, Memcached, ThruDB, CouchDB, Ringo, Scalaris,
Kai, Dynomite, Cassandra, Hypertable, as well as the closed Dynamo,
BigTable, SimpleDB).

For someone using Hadoop at scale, what problem aspects would recommend one
of those over another?
And in your subjective judgement, do any of these seem especially likely to
succeed?

Richard Jones of Last.fm just posted an overview with a great deal of
engineering insight:

http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/
His focus is a production web server farm, and so in some ways orthogonal to
the crowd here -- but still highly recommended.  Swaroop CH of Yahoo wrote a
broad introduction to distributed DBs I also found useful:
  http://www.swaroopch.com/notes/Distributed_Storage_Systems

Both give HBase short shrift, though my impression is that it is the leader
among open projects for massive unordered dataset problems. The answer also,
though, doesn't seem to be a simple "If you're using Hadoop you should be
using HBase, dummy."

I don't have the expertise to write this kind of overview from the hadoop /
big data perspective but would eagerly read such an article from someone who
does, or to summarize the insights of the list.

===

In lieu yet of such a summary, pointers to a few relevant threads:
*
http://www.nabble.com/Why-is-scaling-HBase-much-simpler-then-scaling-a-relational-db--tt18869660.html#a19093685

  (especially Jonathan Gray's breakdown)
* "HBase Performance"
http://www.mail-archive.com/hadoop-u...@lucene.apache.org/msg02540.html
  (and the paper by Stonebraker and friends:
http://www.vldb.org/conf/2007/papers/industrial/p1150-stonebraker.pdf)
*
http://www.nabble.com/Serving-contents-of-large-MapFiles-SequenceFiles-from-memory-across-many-machines-tt19546012.html#a19574917
* On specific problem domains:
  http://www.nabble.com/Indexed-Hashtables-tt21470024.html#a21470848

http://www.nabble.com/Why-can%27t-Hadoop-be-used-for-online-applications---tt19461962.html#a19471894
  http://www.nabble.com/Architecture-question.-tt21100766.html#a21100766

flip

(noted in passing: a huge proportion of the development seems to be coming
out of commercial enterprises and not the academic/HPC community. I worry my
ivory tower is hung up on big iron and the top500.org list, at the expense
of solving the many interesting problems these unlock.)
-- 
http://www.infochimps.org
Connected Open Free Data

Reply via email to