On Sun, Sep 19, 2010 at 20:37, Christopher Bare <cb...@systemsbiology.org> wrote: > Hi Couch-potatoes, > > I'm investigating using CouchDB for a data mining application and > could use some advice.
Cool! Welcome to the party. > > What I have in mind is sharding a collection of documents between > several instances of CouchDB each running on their own nodes. Then, I > want to run distributed map-reduce queries over the whole collection > of documents. Do I understand correctly that Lounge is currently the > way to do this? Lounge is one way. BigCouch (just released) is another. > > How would doing something like this with CouchDB and Lounge compare > with using Hadoop and HBase? I do not know that much about HBase/Hadoop. I bet someone else on the list can add more differences, but I know at least there is a data model difference: CouchDB uses JSON documents but HBase is column oriented. Also, if HBase relies on HDFS then I think the HDFS name node is a single point of failure, whereas you can configure BigCouch and Lounge with redundancy at every level of the system. -Randall