On Sun, Sep 19, 2010 at 20:37, Christopher Bare
<cb...@systemsbiology.org> wrote:
> Hi Couch-potatoes,
>
> I'm investigating using CouchDB for a data mining application and
> could use some advice.

Cool! Welcome to the party.

>
> What I have in mind is sharding a collection of documents between
> several instances of CouchDB each running on their own nodes. Then, I
> want to run distributed map-reduce queries over the whole collection
> of documents. Do I understand correctly that Lounge is currently the
> way to do this?

Lounge is one way. BigCouch (just released) is another.

>
> How would doing something like this with CouchDB and Lounge compare
> with using Hadoop and HBase?

I do not know that much about HBase/Hadoop. I bet someone else on the
list can add more differences, but I know at least there is a data
model difference: CouchDB uses JSON documents but HBase is column
oriented. Also, if HBase relies on HDFS then I think the HDFS name
node is a single point of failure, whereas you can configure BigCouch
and Lounge with redundancy at every level of the system.

-Randall

Reply via email to