Kevin, I just pulled the code and read through the design. Great stuff.
Any thought to potentially using this for real-time processing as well? Right now, we have a set of Hadoop M/R jobs that operate against Cassandra for ETL. We were looking at using Storm for the real-time processing side of things and thought that we could actually abandon Hadoop entirely if we could introduce Cassandra's concept of data locality to Storm. We plan to run head-to-head comparisons between Storm and Hadoop to test out the viability of that approach. Peregrine looks like another contender. cheers, -brian On Dec 27, 2011, at 6:14 AM, Kevin Burton wrote: > > A key innovation here is a partitioning layout algorithm that can support fast > many to many recovery similar to HDFS but still support partitioned operation > with deterministic key placement. > > Thanks for your contribution. > > Is here more detail info on this point? > > yes... our design document: > > http://peregrine_mapreduce.bitbucket.org/design/ > > I actually will probably write a paper on this... > > The more I started down the partitioned filesystem approach in terms of > mapreduce the more I realized that there were some REALLY elegant > imoplementation and design issues that I did not originally appreciate ... > (so I partially got lucky). > > I think this approach could be generalized to work on normal map reduce jobs > without much overhead. > > -- > Founder/CEO Spinn3r.com > > Location: San Francisco, CA > Skype: burtonator > Skype-in: (415) 871-0687 > -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/