Re: distributed map-reduce views

Paul Davis Mon, 20 Sep 2010 13:52:46 -0700

> How would doing something like this with CouchDB and Lounge compare
> with using Hadoop and HBase?


Remember that CouchDB and Hadoop serve different purposes. CouchDB is
a data store, where as Hadoop is a data processing platform. While
they both have "MapReduce" functionality they aren't quite the same
thing.

In CouchDB, when we use Map/Reduce, we create a single persistent
index of data using map and reduce operators. These indexes can then
be queried using single key or range lookups. Because of the
properties of Map/Reduce we're capable of updating these indexes
incrementally.

Hadoop on the other hand is meant to handle arbitrary pipelines of
data processing. Ie, users can configure Hadoop to run multiple stages
of Map/Reduce in order to produce some desired output. The
intermediate stages are not intended to be persistent and query-able.
I'm not familiar enough to know how people use HBase in conjunction
with Hadoop other than I believe its generally a data source. I don't
know if it stores intermediate results or not. As far as I know,
Hadoop doesn't provide incremental indexing.

As Randal points out, there are various differences in implementation,
but its also important to understand the data store vs. data
processing differences of the two systems.

HTH,
Paul Davis

Re: distributed map-reduce views

Reply via email to