Re: Spark enables us to process Big Data on an ARM cluster !!

Chanwit Kaewkasi Thu, 20 Mar 2014 07:54:10 -0700

Thanks, Eustache.

There's the link in the second reply to an article I wrote for DZone.


Best regards,

-chanwit

--
Chanwit Kaewkasi
linkedin.com/in/chanwit


On Thu, Mar 20, 2014 at 9:39 PM, Eustache DIEMERT <eusta...@diemert.fr> wrote:
> Hey, do you have a blog post or url I can share ?
>
> This is a quite cool experiment !
>
> E/
>
>
> 2014-03-20 15:01 GMT+01:00 Chanwit Kaewkasi <chan...@gmail.com>:
>
>> Hi Chester,
>>
>> It is on our todo-list but it doesn't work at the moment. The
>> Parallela cores can not be utilized by the JVM. So, Spark will just
>> use its ARM cores. We'll be looking at Parallela again when the JVM
>> supports it.
>>
>> Best regards,
>>
>> -chanwit
>>
>> --
>> Chanwit Kaewkasi
>> linkedin.com/in/chanwit
>>
>>
>> On Thu, Mar 20, 2014 at 8:52 PM, Chester <chesterxgc...@yahoo.com> wrote:
>> > I am curious  to see if you have tried on Parallela supercomputer (16 or
>> > 64 cores) cluster, run spark on that should be fun.
>> >
>> > Chester
>> >
>> > Sent from my iPad
>> >
>> > On Mar 19, 2014, at 9:18 AM, Chanwit Kaewkasi <chan...@gmail.com> wrote:
>> >
>> >> Hi Koert,
>> >>
>> >> There's some NAND flash built-in each node. We mount the NAND flash as
>> >> a local directory for Spark to spill data out.
>> >> A DZone article, also written by me, will tell more about the cluster.
>> >> We really appreciate the design of Spark's RDD done by the Spark team.
>> >> It turned out to be perfect for ARM clusters.
>> >>
>> >> http://www.dzone.com/articles/big-data-processing-arm-0
>> >>
>> >> Another great thing is that our cluster can operate at the room
>> >> temperature (25C / 77F) too.
>> >>
>> >> The board is Cubieboard here it is:
>> >> https://en.wikipedia.org/wiki/Cubieboard#Specification
>> >>
>> >> Best regards,
>> >>
>> >> -chanwit
>> >>
>> >> --
>> >> Chanwit Kaewkasi
>> >> linkedin.com/in/chanwit
>> >>
>> >>
>> >> On Wed, Mar 19, 2014 at 9:43 PM, Koert Kuipers <ko...@tresata.com>
>> >> wrote:
>> >>> i dont know anything about arm clusters.... but it looks great. what
>> >>> are the
>> >>> specs? the nodes have no local disk at all?
>> >>>
>> >>>
>> >>> On Tue, Mar 18, 2014 at 10:36 PM, Chanwit Kaewkasi <chan...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> We are a small team doing a research on low-power (and low-cost) ARM
>> >>>> clusters. We built a 20-node ARM cluster that be able to start
>> >>>> Hadoop.
>> >>>> But as all of you've known, Hadoop is performing on-disk operations,
>> >>>> so it's not suitable for a constraint machine powered by ARM.
>> >>>>
>> >>>> We then switched to Spark and had to say wow!!
>> >>>>
>> >>>> Spark / HDFS enables us to crush Wikipedia articles (of year 2012) of
>> >>>> size 34GB in 1h50m. We have identified the bottleneck and it's our
>> >>>> 100M network.
>> >>>>
>> >>>> Here's the cluster:
>> >>>>
>> >>>> https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/Mk-I_SSD.png
>> >>>>
>> >>>> And this is what we got from Spark's shell:
>> >>>>
>> >>>> https://dl.dropboxusercontent.com/u/381580/aiyara_cluster/result_00.png
>> >>>>
>> >>>> I think it's the first ARM cluster that can process a non-trivial
>> >>>> size
>> >>>> of Big Data.
>> >>>> (Please correct me if I'm wrong)
>> >>>> I really want to thank the Spark team that makes this possible !!
>> >>>>
>> >>>> Best regards,
>> >>>>
>> >>>> -chanwit
>> >>>>
>> >>>> --
>> >>>> Chanwit Kaewkasi
>> >>>> linkedin.com/in/chanwit
>> >>>
>> >>>
>
>

Re: Spark enables us to process Big Data on an ARM cluster !!

Reply via email to