Hi Randall, thanks for getting in touch. I hope you're still able to contribute to CouchDB in some way or another. Maybe the community just isn't yet ready to commit itself to a de-facto, formalised way to distribute the execution of CouchDB just yet? (lol)
I will keep you upto date with my progress, but I am certainly looking at my project from a parallel distribution problem, as opposed to a DBMS exclusive project, and I have a university cluster at my peril. But I'll keep you updated. @ Jesse - You confirmed some of my suspicions about CouchDB, with regards to its mission, its scalability and its similarity to a distributed system such as Hadoop. It is very useful to be aware of the explicit map-reduce nature with respect to CouchDB, and is not something that will be overlooked in my study for sure (Map-reduce has a vital role in Hadoop (it is the very core of the distribution of processing/data)). Perhaps, in a time not so far away, there could be a study on the scalability and parallel performance on CouchDB where CouchDB offers a developer these things for free ! (?) Rob 2009/10/7 Jesse Hallett <[email protected]> > One issue is that Hadoop and CouchDB are very different tools. > > Hadoop is great at intensive, high-latency data analysis. It doesn't > matter > how complicated the computation you want is - Hadoop will do it for you > because it is a data processing engine. > > CouchDB is a database. It is designed for low-latency, high-availability > operations. CouchDB is not a data processing engine, it is a data > retrieval > engine. It should be faster than Hadoop for tasks that both systems can > handle; and CouchDB can perform some powerful analysis via its map-reduce > capability. But the analysis you can perform with CouchDB will ultimately > be limited by its low-latency design philosophy. > > What can be misleading is that while both Hadoop and CouchDB use > map-reduce, > they use it for very different things. It is analogous to saying "these > two > programs both use iteration over tree structures." One detail on choice of > algorithm does not tell you what a program is designed for or what it is > good at. > > CouchDB uses map-reduce to build pre-computed views of data. The > map-reduce > pattern enforces data isolation which allows CouchDB to incrementally > update > views. CouchDB does not (yet) take advantage of parallel processing when > generating views. Though you can get parallelism by distributing data over > a cluster and splitting queries with a proxy. > > Hadoop uses map-reduce to run computation in parallel and to distribute > computation across multiple machines. The same data isolation that CouchDB > relies on allows this. But Hadoop takes advantage of that feature > differently. > > On Oct 7, 2009 7:29 AM, "Göran Krampe" <[email protected]> wrote: > > Nicholas Orr wrote: > > On Wed, Oct 7, 2009 at 11:53 PM, Rob Stewart > < > [email protected]>... > Using an intermediate library in your language of choice you can get > queries > etc to look rather similar, take a look at this C# example program for > using > Divan: > > http://github.com/gokr/Divan/blob/master/samples/Trivial/Program.cs > > ...funny enough it also uses "Cars" as an example :). Note the LINQ > integration which actually makes it possible to write: > > var fastCars = from c in linqCars where c.HorsePowers >= 175 select c; > > (given a view for it) > > regards, Göran >
