Chanwit, that is awesome! Improvements in shuffle operations should help improve life even more for you. Great to see a data point on ARM.
Sent while mobile. Pls excuse typos etc. On Mar 18, 2014 7:36 PM, "Chanwit Kaewkasi" <chan...@gmail.com> wrote: > Hi all, > > We are a small team doing a research on low-power (and low-cost) ARM > clusters. We built a 20-node ARM cluster that be able to start Hadoop. > But as all of you've known, Hadoop is performing on-disk operations, > so it's not suitable for a constraint machine powered by ARM. > > We then switched to Spark and had to say wow!! > > Spark / HDFS enables us to crush Wikipedia articles (of year 2012) of > size 34GB in 1h50m. We have identified the bottleneck and it's our > 100M network. > > Here's the cluster: > https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/Mk-I_SSD.png > > And this is what we got from Spark's shell: > https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/result_00.png > > I think it's the first ARM cluster that can process a non-trivial size > of Big Data. > (Please correct me if I'm wrong) > I really want to thank the Spark team that makes this possible !! > > Best regards, > > -chanwit > > -- > Chanwit Kaewkasi > linkedin.com/in/chanwit >