I guess you have to understand the difference of architecture. I don't know
much about C++ MPI but it is basically MPI whereas Spark is inspired from
Hadoop MapReduce and optimised for reading/writing large amount of data
with a smart caching and locality strategy. Intuitively, if you have a high
ratio CPU/message then MPI might be better. But what is the ratio is hard
to say and in the end this ratio will depend on your specific application.
Finally, in real life, this difference of performance due to the
architecture may not be the only or the most important factor of choice
like Michael already explained.

Bertrand

On Mon, Jun 16, 2014 at 1:23 PM, Michael Cutler <mich...@tumra.com> wrote:

> Hello Wei,
>
> I talk from experience of writing many HPC distributed application using
> Open MPI (C/C++) on x86, PowerPC and Cell B.E. processors, and Parallel
> Virtual Machine (PVM) way before that back in the 90's.  I can say with
> absolute certainty:
>
> *Any gains you believe there are because "C++ is faster than Java/Scala"
> will be completely blown by the inordinate amount of time you spend
> debugging your code and/or reinventing the wheel to do even basic tasks
> like linear regression.*
>
>
> There are undoubtably some very specialised use-cases where MPI and its
> brethren still dominate for High Performance Computing tasks -- like for
> example the nuclear decay simulations run by the US Department of Energy on
> supercomputers where they've invested billions solving that use case.
>
> Spark is part of the wider "Big Data" ecosystem, and its biggest
> advantages are traction amongst internet scale companies, hundreds of
> developers contributing to it and a community of thousands using it.
>
> Need a distributed fault-tolerant file system? Use HDFS.  Need a
> distributed/fault-tolerant message-queue? Use Kafka.  Need to co-ordinate
> between your worker processes? Use Zookeeper.  Need to run it on a flexible
> grid of computing resources and handle failures? Run it on Mesos!
>
> The barrier to entry to get going with Spark is very low, download the
> latest distribution and start the Spark shell.  Language bindings for Scala
> / Java / Python are excellent meaning you spend less time writing
> boilerplate code, and more time solving problems.
>
> Even if you believe you *need* to use native code to do something
> specific, like fetching HD video frames from satellite video capture cards
> -- wrap it in a small native library and use the Java Native Access
> interface to call it from your Java/Scala code.
>
> Have fun, and if you get stuck we're here to help!
>
> MC
>
>
> On 16 June 2014 08:17, Wei Da <xwd0...@gmail.com> wrote:
>
>> Hi guys,
>> We are making choices between C++ MPI and Spark. Is there any official
>> comparation between them? Thanks a lot!
>>
>> Wei
>>
>
>

Reply via email to