Kevin,

I just pulled the code and read through the design.  Great stuff.

Any thought to potentially using this for real-time processing as well?  Right 
now, we have a set of Hadoop M/R jobs that operate against Cassandra for ETL.  
We were looking at using Storm for the real-time processing side of things and 
thought that we could actually abandon Hadoop entirely if we could introduce 
Cassandra's concept of data locality to Storm.  We plan to run head-to-head 
comparisons between Storm and Hadoop to test out the viability of that approach.

Peregrine looks like another contender.

cheers,
-brian
 


On Dec 27, 2011, at 6:14 AM, Kevin Burton wrote:

> 
> A key innovation here is a partitioning layout algorithm that can support fast
> many to many recovery similar to HDFS but still support partitioned operation
> with deterministic key placement.
> 
> Thanks for your contribution.
> 
> Is here more detail info on this point? 
> 
> yes... our design document:
> 
> http://peregrine_mapreduce.bitbucket.org/design/
> 
> I actually will probably write a paper on this... 
> 
> The more I started down the partitioned filesystem approach in terms of 
> mapreduce the more I realized that there were some REALLY elegant 
> imoplementation and design issues that I did not originally appreciate ... 
> (so I partially got lucky).
> 
> I think this approach could be generalized to work on normal map reduce jobs 
> without much overhead.
>  
> -- 
> Founder/CEO Spinn3r.com
> 
> Location: San Francisco, CA
> Skype: burtonator
> Skype-in: (415) 871-0687
> 

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Reply via email to